28a1c4f33e seems to call an improper
ensure clause. [Bug #20655]
Than fixing it properly, I bet it would be much better to simply revert
that commit. It reduces the unneeded complexity. Jumping into a block
called by a C function like Hash#each with callcc is user's fault.
It does not need serious support.
[Feature #20646]Improve Socket.tcp
This is a proposed improvement to `Socket.tcp`, which has implemented Happy Eyeballs version 2 (RFC8305) in PR9374.
1. Background
I implemented Happy Eyeballs version 2 (HEv2) for Socket.tcp in PR9374, but several issues have been identified:
- `IO.select` waits for name resolution or connection establishment in v46w, but it does not consider the case where both events occur simultaneously when it returns a value.
- In this case, Socket.tcp can only capture one event and needs to execute an unnecessary loop to capture the other one, calling `IO.select` one extra time.
- `IO.select` waits for both IPv6/IPv4 name resolution (in start), but when it returns a value, it doesn't consider the case where name resolution for both address families is complete.
- In this case, `Socket.tcp` can only obtain the addresses of one address family and needs to execute an unnecessary loop obtain the other addresses, calling `IO.select` one extra time.
- The consideration for `connect_timeout` was insufficient. After initiating one or more connections, it raises a 'user specified timeout' after the `connect_timeout` period even if there were addresses that have been resolved and have not yet tried to connect.
- It does not retry with another address in case of a connection failure.
- It executes unnecessary state transitions even when an IP address is passed as the `host` argument.
- The regex for IP addresses did not correctly specify the start and end.
2. Proposal & Outcome
To overcome the aforementioned issues, this PR introduces the following changes:
- Previously, each loop iteration represented a single state transition. This has been changed to execute all processes that meet the execution conditions within a single loop iteration.
- This prevents unnecessary repeated loops and calling `IO.select`
- Introduced logic to determine the timeout value set for `IO.select`. During the Resolution Delay and Connection Attempt Delay, the user-specified timeout is ignored. Otherwise, the timeout value is set to the larger of `resolv_timeout` and `connect_timeout`.
- This ensures that the `connect_timeout` is only detected after attempting to connect to all resolved addresses.
- Retry with another address in case of a connection failure.
- This prevents unnecessary repeated loops upon connection failure.
- Call `tcp_without_fast_fallback` when an IP address is passed as the host argument.
- This prevents unnecessary state transitions when an IP address is passed.
- Fixed regex for IP addresses.
Additionally, the code has been reduced by over 100 lines, and redundancy has been minimized, which is expected to improve readability.
3. Performance
No significant performance changes were observed in the happy case before and after the improvement.
However, improvements in state transition deficiencies are expected to enhance performance in edge cases.
```ruby
require 'socket'
require 'benchmark'
Benchmark.bmbm do |x|
x.report('fast_fallback: true') do
30.times { Socket.tcp("www.ruby-lang.org", 80) }
end
x.report('fast_fallback: false') do # Ruby3.3時点と同じ
30.times { Socket.tcp("www.ruby-lang.org", 80, fast_fallback: false) }
end
end
```
Before:
```
~/s/build ❯❯❯ ../install/bin/ruby ../ruby/test.rb
user system total real
fast_fallback: true 0.021315 0.040723 0.062038 ( 0.504866)
fast_fallback: false 0.007553 0.026248 0.033801 ( 0.533211)
```
After:
```
~/s/build ❯❯❯ ../install/bin/ruby ../ruby/test.rb
user system total real
fast_fallback: true 0.023081 0.040525 0.063606 ( 0.406219)
fast_fallback: false 0.007302 0.025515 0.032817 ( 0.418680)
```
> ..., and on other POSIX systems we'll use `read`.
As `pm_string_mapped_init`'s doc comment says, it should fall back to
`read(2)`-based implementation on platforms without memory-mapped files
like WASI, but it didn't. This commit fixes it by calling `pm_string_file_init`
in the fallback case.
Also `defined(_POSIX_MAPPED_FILES)` check for `read(2)`-based path is
unnecessary, and it prevents the fallback from being executed, so this
change removes it.
https://github.com/ruby/prism/commit/b3d9064b71
Previously, GCC 11 on x86-64 inlined the heavy weight logic for
potentially triggering GC into newobj_alloc(). This slowed down
the hotter code path where the ractor cache hits, causing a degradation
to allocation throughput.
Outline the logic into a separate function and have it never inlined.
This restores allocation throughput to the same level as
98eeadc ("Development of 3.4.0 started.").
To evaluate, instrument miniruby so it allocates a bunch of objects and
then exits:
diff --git a/eval.c b/eval.c
--- a/eval.c
+++ b/eval.c
@@ -92,6 +92,15 @@ ruby_setup(void)
}
EC_POP_TAG();
+rb_gc_disable();
+rb_execution_context_t *ec = GET_EC();
+long const n = 20000000;
+for (long i = 0; i < n; ++i) {
+ rb_wb_protected_newobj_of(ec, 0, T_OBJECT, 40);
+}
+printf("alloc %ld\n", n);
+exit(0);
+
return state;
}
With `3.3-equiv` being 98eeadc, and `pre` being f2728c3393
and `post` being this commit, I have:
$ hyperfine -L buildtag post,pre,3.3-equiv '/ruby/build-{buildtag}/miniruby'
Benchmark 1: /ruby/build-post/miniruby
Time (mean ± σ): 873.4 ms ± 2.8 ms [User: 377.6 ms, System: 490.2 ms]
Range (min … max): 868.3 ms … 877.8 ms 10 runs
Benchmark 2: /ruby/build-pre/miniruby
Time (mean ± σ): 960.1 ms ± 2.8 ms [User: 430.8 ms, System: 523.9 ms]
Range (min … max): 955.5 ms … 964.2 ms 10 runs
Benchmark 3: /ruby/build-3.3-equiv/miniruby
Time (mean ± σ): 886.9 ms ± 2.8 ms [User: 379.5 ms, System: 501.0 ms]
Range (min … max): 883.0 ms … 890.8 ms 10 runs
Summary
'/ruby/build-post/miniruby' ran
1.02 ± 0.00 times faster than '/ruby/build-3.3-equiv/miniruby'
1.10 ± 0.00 times faster than '/ruby/build-pre/miniruby'
These results are from a Skylake server with GCC 11.
We discovered that having gc.o and gc_impl.o in separate translation
units diminishes codegen quality with GCC 11 on x86-64. This commit
solves that problem by including default/gc.c into gc.c, letting the
optimizer have visibility into the body of functions again in builds
not using link-time optimization, which are common.
This effectively restores things to the way they were before
[Feature #20470] from the optimizer's perspective while maintaining the
ability to build gc/default.c as a DSO.
There were a few functions duplicated across gc.c and gc/default.c.
Extract them and put them into gc/gc.h.
[Bug #20653]
This commit refactors how Onigmo handles timeout. Instead of raising a
timeout error, onig_search will return a ONIGERR_TIMEOUT which the
caller can free memory, and then raise a timeout error.
This fixes a memory leak in String#start_with when the regexp times out.
For example:
regex = Regexp.new("^#{"(a*)" * 10_000}x$", timeout: 0.000001)
str = "a" * 1000000 + "x"
10.times do
100.times do
str.start_with?(regex)
rescue
end
puts `ps -o rss= -p #{$$}`
end
Before:
33216
51936
71152
81728
97152
103248
120384
133392
133520
133616
After:
14912
15376
15824
15824
16128
16128
16144
16144
16160
16160
We use pre-existence of `rake_path` to decide whether we need to
regenerate dummy test gems in `tmp`. When changing rubies, the previous
implementation will believe that the correct `rake_path` exists
and avoids regenerating dummy gems, given an error like the following
when specs are run:
```
(...)
Could not find rubygems-generate_index lib directory in /path/to/rubygems/bundler/tmp/1/gems/base/ruby/3.2.0
# ./spec/support/builders.rb:253:in `block in update_repo'
# ./spec/support/helpers.rb:337:in `block in with_gem_path_as'
# ./spec/support/helpers.rb:351:in `without_env_side_effects'
# ./spec/support/helpers.rb:332:in `with_gem_path_as'
# ./spec/support/builders.rb:251:in `update_repo'
# ./spec/support/builders.rb:228:in `build_repo'
# ./spec/support/builders.rb:197:in `build_repo4'
# ./spec/commands/lock_spec.rb:103:in `block (2 levels) in <top (required)>'
(...)
```
To fix this, fix the part of the path that depends on the implementation
and the Ruby version so that we don't give false positives.
https://github.com/rubygems/rubygems/commit/fafacfa210
Removes the symlink for gems.rb.tt and instead uses the singular
template file. Only the destination filename for the gemfile reads from
the `init_gems_rb` setting.
https://github.com/rubygems/rubygems/commit/43ce0e1666
I get a slight boost from these with GCC 11 on Intel Skylake.
Part of a larger story to fix an allocation throughput regression
compared to 98eeadc ("Development of 3.4.0 started.") as the baseline.
[Bug #20650]
The capture group allocates memory that is leaked when it times out.
For example:
re = Regexp.new("^#{"(a*)" * 10_000}x$", timeout: 0.000001)
str = "a" * 1000000 + "x"
10.times do
100.times do
re =~ str
rescue Regexp::TimeoutError
end
puts `ps -o rss= -p #{$$}`
end
Before:
34688
56416
78288
100368
120784
140704
161904
183568
204320
224800
After:
16288
16288
16880
16896
16912
16928
16944
17184
17184
17200
* Note that we could shift the flags by 2 on serialize & deserialize
but it does not seems worth it as it does not save serialized size
in any significant amount, i.e. average was 0.799 before #2924.
* $ bundle exec rake serialized_size:topgems
Before:
Total sizes for top 100 gems:
total source size: 90207647
total serialized size: 69477115
total serialized/total source: 0.770
Stats of ratio serialized/source per file:
average: 0.844
median: 0.825
1st quartile: 0.597
3rd quartile: 1.064
min - max: 0.078 - 3.792
After:
Total sizes for top 100 gems:
total source size: 90207647
total serialized size: 66150209
total serialized/total source: 0.733
Stats of ratio serialized/source per file:
average: 0.800
median: 0.779
1st quartile: 0.568
3rd quartile: 1.007
min - max: 0.076 - 3.675
https://github.com/ruby/prism/commit/e012072f70
* $ bundle exec rake serialized_size:topgems
Before:
Total sizes for top 100 gems:
total source size: 90207647
total serialized size: 86284647
total serialized/total source: 0.957
Stats of ratio serialized/source per file:
average: 0.952
median: 0.937
1st quartile: 0.669
3rd quartile: 1.206
min - max: 0.080 - 4.065
After:
Total sizes for top 100 gems:
total source size: 90207647
total serialized size: 69477115
total serialized/total source: 0.770
Stats of ratio serialized/source per file:
average: 0.844
median: 0.825
1st quartile: 0.597
3rd quartile: 1.064
min - max: 0.078 - 3.792
https://github.com/ruby/prism/commit/cf90fe5759
OpenSSL::ASN1 is being rewritten in Ruby. To make it easier, let's
remove dependency to the instance variables and the internal-use
function ossl_asn1_get_asn1type() outside OpenSSL::ASN1.
This also fixes the insufficient validation of the passed value with
its tagging.
https://github.com/ruby/openssl/commit/35a157462e