The vm1_ prefix and vm2_ had had special meaning until
820ad9cb1d and
12068aa4e9. AFAIK there's no special
meaning in vm3_ prefix.
As they have confused people (like "In `benchmark` what is difference
between `vm1_`, `vm2_` and `vm3_`"), I'd like to remove the obsoleted
prefix as we obviated that two years ago.
for VM_METHOD_TYPE_CFUNC.
This has been known to decrease optcarrot fps:
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark.yml --repeat-count=24 --output=all
before --jit: ruby 2.8.0dev (2020-04-13T16:25:13Z master fb40495cd9) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-13T23:23:11Z mjit-inline-c bdcd06d159) +JIT [x86_64-linux]
Calculating -------------------------------------
before --jit after --jit
Optcarrot Lan_Master.nes 66.38132676191719 67.41369177299630 fps
69.42728743772243 68.90327567263054
72.16028300263211 69.62605130880686
72.46631319102777 70.48818243767207
73.37078877002490 70.79522887347566
73.69422431217367 70.99021920193194
74.01471487018695 74.69931965402584
75.48685183295630 74.86714575949016
75.54445264507932 75.97864419721677
77.28089738169756 76.48908637569581
78.04183397891302 76.54320932488021
78.36807984096562 76.59407262898067
78.92898762543574 77.31316743361343
78.93576483233765 77.97153484180480
79.13754917503078 77.98478782102325
79.62648945850653 78.02263322726446
79.86334213878064 78.26333724045934
80.05100635898518 78.60056756355614
80.26186843769584 78.91082645644468
80.34205717020330 79.01226659142263
80.62286066044338 79.32733939423721
80.95883033058557 79.63793060542024
80.97376819251613 79.73108936622778
81.23050939202896 80.18280109433088
```
and I deleted this capability in an early stage of YARV-MJIT development:
0ab130feee
I suspect either of the following things could be the cause:
* Directly calling vm_call_cfunc requires more optimization effort in GCC,
resulting in 30ms-ish compilation time increase for such methods and
decreasing the number of methods compiled in a benchmarked period.
* Code size increase => icache miss hit
These hypotheses could be verified by some methodologies. However, I'd
like to introduce this regardless of the result because this blocks
inlining C method's definition.
I may revert this commit when I give up to implement inlining C method
definition, which requires this change.
Microbenchmark-wise, this gives slight performance improvement:
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_send_cfunc.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-04-13T16:25:13Z master fb40495cd9) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-13T23:23:11Z mjit-inline-c bdcd06d159) +JIT [x86_64-linux]
Calculating -------------------------------------
before --jit after --jit
mjit_send_cfunc 41.961M 56.489M i/s - 100.000M times in 2.383143s 1.770244s
Comparison:
mjit_send_cfunc
after --jit: 56489372.5 i/s
before --jit: 41961388.1 i/s - 1.35x slower
```
Previously, passing a keyword splat to a method always allocated
a hash on the caller side, and accepting arbitrary keywords in
a method allocated a separate hash on the callee side. Passing
explicit keywords to a method that accepted a keyword splat
did not allocate a hash on the caller side, but resulted in two
hashes allocated on the callee side.
This commit makes passing a single keyword splat to a method not
allocate a hash on the caller side. Passing multiple keyword
splats or a mix of explicit keywords and a keyword splat still
generates a hash on the caller side. On the callee side,
if arbitrary keywords are not accepted, it does not allocate a
hash. If arbitrary keywords are accepted, it will allocate a
hash, but this commit uses a callinfo flag to indicate whether
the caller already allocated a hash, and if so, the callee can
use the passed hash without duplicating it. So this commit
should make it so that a maximum of a single hash is allocated
during method calls.
To set the callinfo flag appropriately, method call argument
compilation checks if only a single keyword splat is given.
If only one keyword splat is given, the VM_CALL_KW_SPLAT_MUT
callinfo flag is not set, since in that case the keyword
splat is passed directly and not mutable. If more than one
splat is used, a new hash needs to be generated on the caller
side, and in that case the callinfo flag is set, indicating
the keyword splat is mutable by the callee.
In compile_hash, used for both hash and keyword argument
compilation, if compiling keyword arguments and only a
single keyword splat is used, pass the argument directly.
On the caller side, in vm_args.c, the callinfo flag needs to
be recognized and handled. Because the keyword splat
argument may not be a hash, it needs to be converted to a
hash first if not. Then, unless the callinfo flag is set,
the hash needs to be duplicated. The temporary copy of the
callinfo flag, kw_flag, is updated if a hash was duplicated,
to prevent the need to duplicate it again. If we are
converting to a hash or duplicating a hash, we need to update
the argument array, which can including duplicating the
positional splat array if one was passed. CALLER_SETUP_ARG
and a couple other places needs to be modified to handle
similar issues for other types of calls.
This includes fairly comprehensive tests for different ways
keywords are handled internally, checking that you get equal
results but that keyword splats on the caller side result in
distinct objects for keyword rest parameters.
Included are benchmarks for keyword argument calls.
Brief results when compiled without optimization:
def kw(a: 1) a end
def kws(**kw) kw end
h = {a: 1}
kw(a: 1) # about same
kw(**h) # 2.37x faster
kws(a: 1) # 1.30x faster
kws(**h) # 2.19x faster
kw(a: 1, **h) # 1.03x slower
kw(**h, **h) # about same
kws(a: 1, **h) # 1.16x faster
kws(**h, **h) # 1.14x faster
Instead of searching twice to extract and to delete, extract and
delete the found position at the first search.
This makes faster nearly twice, for regexps and strings.
| |compare-ruby|built-ruby|
|:-------------|-----------:|---------:|
|regexp-short | 2.143M| 3.918M|
|regexp-long | 105.162k| 205.410k|
|string-short | 3.789M| 7.964M|
|string-long | 1.301M| 2.457M|
I noticed that some files in rubygems were executable, and I could think
of no reason why they should be.
In general, I think ruby files should never have the executable bit set
unless they include a shebang, so I run the following command over the
whole repo:
```bash
find . -name '*.rb' -type f -executable -exec bash -c 'grep -L "^#!" $1 || chmod -x $1' _ {} \;
```
* Stop making a redundant hash copy in Hash#dup
It was making a copy of the hash without rehashing, then created an
extra copy of the hash to do the rehashing. Since rehashing creates
a new copy already, this change just uses that rehashing to make
the copy.
[Bug #16121]
* Remove redundant Check_Type after to_hash
* Fix freeing and clearing destination hash in Hash#initialize_copy
The code was assuming the state of the destination hash based on the
source hash for clearing any existing table on it. If these don't match,
then that can cause the old table to be leaked. This can be seen by
compiling hash.c with `#define HASH_DEBUG 1` and running the following
script, which will crash from a debug assertion.
```ruby
h = 9.times.map { |i| [i, i] }.to_h
h.send(:initialize_copy, {})
```
* Remove dead code paths in rb_hash_initialize_copy
Given that `RHASH_ST_TABLE_P(h)` is defined as `(!RHASH_AR_TABLE_P(h))`
it shouldn't be possible for a hash to be neither of these, so there
is no need for the removed `else if` blocks.
* Share implementation between Hash#replace and Hash#initialize_copy
This also fixes key rehashing for small hashes backed by an array
table for Hash#replace. This used to be done consistently in ruby
2.5.x, but stopped being done for small arrays in ruby 2.6.x.
This also bring optimization improvements that were done for
Hash#initialize_copy to Hash#replace.
* Add the Hash#dup benchmark
I noticed that in case of cache misshit, re-calculated cc->me can
be the same method entry than the pevious one. That is an okay
situation but can't we partially reuse the cache, because cc->call
should still be valid then?
One thing that has to be special-cased is when the method entry
gets amended by some refinements. That happens behind-the-scene
of call cache mechanism. We have to check if cc->me->def points to
the previously saved one.
Calculating -------------------------------------
trunk ours
vm2_poly_same_method 1.534M 2.025M i/s - 6.000M times in 3.910203s 2.962752s
Comparison:
vm2_poly_same_method
ours: 2025143.9 i/s
trunk: 1534447.2 i/s - 1.32x slower
This approach is simpler than the previous approach which tries to
emulate realpath(3). It also performs much better on both Linux and
OpenBSD on the included benchmarks.
By using realpath(3), we can better integrate with system security
features such as OpenBSD's unveil(2) system call.
This does not use realpath(3) on Windows even if it exists, as the
approach for checking for absolute paths does not work for drive
letters. This can be fixed without too much difficultly, though until
Windows defines realpath(3), there is no need to do so.
For File.realdirpath, where the last element of the path is not
required to exist, fallback to the previous approach, as realpath(3)
on most operating systems requires the whole path be valid (per POSIX),
and the operating systems where this isn't true either plan to conform
to POSIX or may change to conform to POSIX in the future.
glibc realpath(3) does not handle /path/to/file.rb/../other_file.rb
paths, returning ENOTDIR in that case. Fallback to the previous code
if realpath(3) returns ENOTDIR.
glibc doesn't like realpath(3) usage for paths like /dev/fd/5,
returning ENOENT even though the path may appear to exist in the
filesystem. If ENOENT is returned and the path exists, then fall
back to the default approach.
and switch-case branches.
Buffer allocation optimization using `ALLOCA_N` would be the main
benefit of patch. It eliminates the O(N) buffer extensions.
It also reduces the number of branches using escape table like
https://mattn.kaoriya.net/software/lang/c/20160817011915.htm.
Closes: https://github.com/ruby/ruby/pull/2226
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Yasuhiro MATSUMOTO <mattn.jp@gmail.com>
and switch-case branches.
Buffer allocation optimization using `ALLOCA_N` would be the main
benefit of patch. It eliminates the O(N) buffer extensions.
It also reduces the number of branches using escape table like
https://mattn.kaoriya.net/software/lang/c/20160817011915.htm.
Closes: https://github.com/ruby/ruby/pull/2226
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Yasuhiro MATSUMOTO <mattn.jp@gmail.com>
I heard actually this part would not be a bottleneck for rendering
because writing anything to terminal takes way longer time anyway, but I
thought this benchmark script might be useful for benchmarking Ruby
itself.
To prevent noise for benchmark result. Just for the case.
[Bug #15552]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66893 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
To support the change of default encoding.
It had not worked correctly since 2.0 :-)
[Bug #15552]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66892 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Add new benchmark scripts for binary operations of Complex with float
components.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66680 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/lib/load.rb: add small utility which requires
benchmark-driver.rb. You can load this file and can
use benchmark-driver.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64870 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Simply use DISPATCH_ORIGINAL_INSN instead of rb_funcall. This is,
when possible, overall performant because method dispatch results are
cached inside of CALL_CACHE. Should also be good for JIT.
----
trunk: ruby 2.6.0dev (2018-09-12 trunk 64689) [x86_64-darwin15]
ours: ruby 2.6.0dev (2018-09-12 leaf-insn 64688) [x86_64-darwin15]
last_commit=make opt_str_freeze leaf
Calculating -------------------------------------
trunk ours
vm2_freezestring 5.440M 31.411M i/s - 6.000M times in 1.102968s 0.191017s
Comparison:
vm2_freezestring
ours: 31410864.5 i/s
trunk: 5439865.4 i/s - 5.77x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64690 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
because it's only available for limited platforms for now.
I'll make it portable and show it later.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63954 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
benchmark/README.md: fix help output, which is changed on v0.14.6.
Especially `e1::path1,arg1,...; e2::path2,arg2` part was wrong since `,`
can't be used to split arguments anymore.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63945 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
to improve the accuracy of measurement by stop using block.
benchmark/app_erb.rb -> benchmark/app_erb.yml: renamed and revised
benchmark/erb_render.rb -> benchmark/erb_render.yml: ditto
benchmark/README.md: follow renames
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63941 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
the original behavior of benchmark/driver.rb.
Probably I won't use this but this is requested by ko1.
Use this with:
make benchmark OPTS="-o driver"
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63937 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
supported by legacy benchmark/driver.rb.
benchmark/README.md: document them
common.mk: update benchmark_driver to correct 0.0 output and to fix
spacing format of `-o simple` and `-o markdown`.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63933 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
runner/peak.rb: ditto
This is needed to make commands like `make -C .ruby-svn benchmark
ITEM=erb OPTS="-r size -o simple"` succeed.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63932 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
benchmark_driver is the official way to install benchmark_driver.
benchmark-driver is just an alias for it.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63930 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
benchmark/*.rb is only benchmarks now. We don't need prefixes.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63928 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This reverts r63900.
Having single-execution benchmark as a normal Ruby script is preferred
by ko1. I'm not a big fan of having inconsistent benchmark formats, but
I can understand some benefits of it.
common.mk: remove obsolsted benchmark-each PHONY declaration, support
running Ruby scripts added by this commit.
README.md: follow ARGS change
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63926 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
by adding runner plugins for them.
benchmark/lib/benchmark_driver/runner/peak.rb: added peak runner plugin
benchmark/lib/benchmark_driver/runner/size.rb: added size runner plugin
common.mk: allow using them
benchmark/memory_wrapper.rb: deleted in favor of those runner plugins
benchmark/README.md: document them
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
in favor of just using benchmark_driver.gem.
common.mk: The new `make benchmark` covers the both usages for old `make
benchmark` and old `make benchmark-each`. So `make benchmark-each` is
dropped now.
benchmark/README.md: Explain its details
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63918 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
benchmark/driver.rb: deal with breaking changes which are actually
introduced for this driver.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63916 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
to original benchmark-driver command.
I'm going to add some runner plugins to resurrect metrics which were
originally supported by benchmark/driver.rb...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63906 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Now all benchmarks are converted to YAMLs.
common.mk: Drop obsoleted bm_* pattern
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63902 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
scripts.
This is needed to finish converting Ruby scripts to YAMLs.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63899 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
for now.
When measured script is really too fast, while loop substituion may
return a negative benchmark result.
Probably benchmark_driver.gem has rooms to be improved about this.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63895 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
benchmark/driver.rb had removed the cost for while loop in benchmark/bm_vm1_*.rb,
and benchmark_driver.gem can achieve the same thing with `loop_count`.
But unfortunately current benchmark_driver.gem can't solve it only for vm1_yield.yml...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63893 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This YAML transformation is needed to support whileloop2 time substituion
by benchmark_driver.gem later.
This commmit changes no benchmark behavior.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63892 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This YAML transformation is needed to support whileloop time substituion
by benchmark_driver.gem later.
This commmit changes no benchmark behavior.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63891 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Makefile.in: Clone benchmark-driver repository in benchmark/benchmark-driver
`make update-benchmark-driver`, like simplecov.
win32/Makefile.sub: Roughly do the same thing.
.gitignore: Ignore the cloned repository.
common.mk: Trigger `make update-benchmark-driver` to run `make benchmark`
and adjust arguments for benchmark_driver.gem.
benchmark/require.yml: renamed from benchmark/bm_require.rb, benchmark/prepare_require.rb
benchmark/require_thread.yml: renamed from benchmark/bm_require_thread.rb, benchmark/prepare_require_thread.rb
benchmark/so_count_words.yml: renamed from benchmark/bm_so_count_words.rb, benchmark/prepare_so_count_words.rb,
benchmark/wc.input.base
benchmark/so_k_nucleotide.yml: renamed from benchmark/bm_so_k_nucleotide.rb, benchmark/prepare_so_k_nucleotide.rb,
benchmark/make_fasta_output.rb
benchmark/so_reverse_complement.yml: renamed from benchmark/bm_so_reverse_complement.rb, benchmark/prepare_so_reverse_complement.rb,
benchmark/make_fasta_output.rb
I'm sorry but I made some duplications between benchmark/require.yml and benchmark/require_thread.yml,
and between benchmark/so_k_nucleotide.yml and benchmark/so_reverse_complement.yml.
If you're not comfortable with it, please combine these YAMLs to share
the same prelude. One YAML file can have multiple benchmark definitions
sharing prelude.
benchmark/driver.rb: Replace its core feature with benchmark_driver.gem.
Some old features are gone for now, but I'll add them again later.
[Misc #14902]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63888 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
It seems like they are all benchmark drivers but "benchmark/driver.rb"
is the latest and others are no longer used.
It's confusing to have multiple drivers (and actually I used
benchmark/run.rb since I didn't know I should use benchmark/driver.rb).
As I'm going to support only benchmark/driver.rb features in Misc#14902,
let me delete them.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63886 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
"Real" time is too unstable on my systems, hopefully counting
only CPU time can gain more reliable benchmark results.
[ruby-core:87362] [Feature #14815]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63564 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
benchmark/memory_wrapper.rb will Kernel#load these
scripts, preventing DATA from being initialized, so
use heredoc instead.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63497 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb: use `--version` instead of `-v` to get version
information.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62515 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
[Feature #14045]
* insns.def (getblockparam, setblockparam): add special access
instructions for block parameters.
getblockparam checks VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM and
if it is not set this instruction creates a Proc object from
a given blcok and set VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM.
setblockparam is similar to setlocal, but set
VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM.
* compile.c: use get/setblockparm instead get/setlocal instructions.
Note that they are used for method local block parameters (def m(&b)),
not for block local method parameters (iter{|&b|).
* proc.c (get_local_variable_ptr): creates Proc object for
Binding#local_variable_get/set.
* safe.c (safe_setter): we need to create Proc objects for postponed
block parameters when $SAFE is changed.
* vm_args.c (args_setup_block_parameter): used only for block local blcok
parameters.
* vm_args.c (vm_caller_setup_arg_block): if called with
VM_CALL_ARGS_BLOCKARG_BLOCKPARAM flag then passed block values should be
a block handler.
* test/ruby/test_optimization.rb: add tests.
* benchmark/bm_vm1_blockparam*: added.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60397 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This means File.chmod, File.lchmod, File.chown, File.lchown,
File.unlink, and File.utime operations on slow filesystems
no longer hold up other threads.
The platform-specific utime_failed changes is compile-tested
using a new UTIME_EINVAL macro
This hurts performance on fast filesystem, but these methods
are unlikely to be performance bottlenecks and (IMHO) avoiding
pathological slowdowns and stalls are more important.
benchmark results:
minimum results in each 3 measurements.
Execution time (sec)
name trunk built
file_chmod 0.591 0.801
Speedup ratio: compare with the result of `trunk' (greater is better)
name built
file_chmod 0.737
* file.c (UTIME_EINVAL): new macro to ease compile-testing
* file.c (struct apply_arg): new struct
* file.c (no_gvl_apply2files): new function
* file.c (apply2files): release GVL
* file.c (chmod_internal): adjust for apply2files changes
* file.c (lchmod_internal): ditto
* file.c (chown_internal): ditto
* file.c (lchown_internal): ditto
* file.c (utime_failed): ditto
* file.c (utime_internal): ditto
* file.c (unlink_internal): ditto
[ruby-core:83200] [Feature #13996]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60386 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This converts all slow syscalls in the Dir.empty? implementation
to release GVL. We avoid unnecessarily GVL release and
reacquire for each slow call (opendir, readdir, closedir) and
instead only release and acquire the GVL once in the common
case.
Benchmark results show a small degradation in single-threaded
performance:
Execution time (sec)
name trunk built
dir_empty_p 0.689 0.758
Speedup ratio: compare with the result of `trunk' (greater is better)
name built
dir_empty_p 0.909
* dir.c (rb_gc_for_fd_with_gvl): new function
(nogvl_dir_empty_p): ditto
(dir_s_empty_p): use new functions to release GVL
* benchmark/bm_dir_empty_p.rb: new benchmark
[ruby-core:83071] [Feature #13958]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60111 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
rename(2) requires two pathname resolution operations which can
take considerable time on slow filesystems, release the GVL so
operations on other threads may proceed.
On fast, local filesystems, this change results in some slowdown
as shown by the new benchmark. I consider the performance trade
off acceptable as cases are avoided.
benchmark results:
minimum results in each 3 measurements.
Execution time (sec)
name trunk built
file_rename 2.648 2.804
Speedup ratio: compare with the result of `trunk' (greater is better)
name built
file_rename 0.944
* file.c (no_gvl_rename): new function
(rb_file_s_rename): release GVL for renames
* benchmark/bm_file_rename.rb: new benchmark
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_strseq_index): refactor and avoid
call of str_strlen() when offset == 0.
it will improve performance of String#index and #include?
* benchmark/bm_string_index.rb: benchmark for this change
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60086 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
[Feature #13884]
Reduce number of memory allocations for "and", "or" and "diff"
operations on small arrays
Very often, arrays are used to filter parameters and to select
interesting items from 2 collections and very often these
collections are small enough, for example:
```ruby
SAFE_COLUMNS = [:id, :title, :created_at]
def columns
@all_columns & SAFE_COLUMNS
end
```
In this patch, I got rid of unnecessary memory allocations for
small arrays when "and", "or" and "diff" operations are performed.
name | HEAD | PATCH
-----------------+------:+------:
array_small_and | 0.615 | 0.263
array_small_diff | 0.676 | 0.282
array_small_or | 0.953 | 0.463
name | PATCH
-----------------+------:
array_small_and | 2.343
array_small_diff | 2.392
array_small_or | 2.056
name | HEAD | PATCH
-----------------+------:+------:
array_small_and | 1.429 | 1.005
array_small_diff | 1.493 | 0.878
array_small_or | 1.672 | 1.152
name | PATCH
-----------------+------:
array_small_and | 1.422
array_small_diff | 1.700
array_small_or | 1.452
Author: Dmitry Bochkarev <dimabochkarev@gmail.com>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60057 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
from bm_app_erb_render.rb.
I'm told from ko1 that bm_app_* is namespace for Ruby applications,
not for ERB and we should use bm_erb_* for ERB benchmark instead.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58915 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
to skip object allocation for static string.
We can't always enable frozen_string_literal pragma because we can't
freeze string literals embedded by user for backward compatibility.
So we need to use fstring for each static string.
Since adding ".freeze" to string literals in #content_dump is slow
on compiling, I used unary "-" operator instead.
benchmark/bm_app_erb_render.rb: Added rendering-only benchmark to
test rendering performance on production environment.
This benchmark is created to reproduce the behavior on Sinatra (Tilt).
Thus it doesn't use ERB#result to skip parsing compiled code.
It doesn't use ERB#def_method too to regard `title` and `content` as
local variables. If we use #def_method, `title` and `content` needs
to be method call. I wanted to avoid it.
This patch's benchmark results is:
* Before
app_erb_render 1.250
app_erb 0.704
* After
app_erb_render 1.066
app_erb 0.686
This patch optimizes rendering performance (app_erb_render) without
spoiling (total of rendering +) compiling performance (app_erb).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58905 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
There are currently no benchmarks for Fiber performance, I
should've committed this years ago when [Feature #10341] was
implemented.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58606 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
I was about to write off this benchmark while working on GVL
improvements on multi-core systems.
However I noticed it exposes a weakness in my work-in-progress
code when I tested on an old single CPU system. Further testing
reveals setting CPU affinity ("schedtool -a 0x1" on Linux) on a
modern multi-core system is enough to reproduce the problem
exposed by this benchmark.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58571 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
"lazy_sweep" does not appear to have ever been a valid kwarg
for GC.start, however the opposite of "lazy_sweep" appears
to be "immediate_sweep". So use immediate_sweep, and flip
the boolean value of each arg.
I guess this only started failing with r56981 in Dec 2016
("class.c: missing unknown_keyword_error",
commit e3f0cca2f26ba44c810ac980cdff7dda129ae533)
* benchmark/bm_vm1_gc_wb_ary.rb: "lazy_sweep: false" => "immediate_sweep: true"
* benchmark/bm_vm1_gc_wb_ary_promoted.rb: ditto
* benchmark/bm_vm1_gc_wb_obj.rb: ditto
* benchmark/bm_vm1_gc_wb_obj_promoted.rb: ditto
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58565 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This is currently for testing GVL performance in the uncontended
case: IO#write and IO#read unconditionally release GVL for
blocking I/O with pipe.
It will also be interesting to see how this changes if we switch
to M:N threading model.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58556 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The performance of SizedQueue is a bit more complex than
regular Queue, so it deserves a separate benchmark.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58509 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
For testing Linux socket-only workaround for
https://bugs.ruby-lang.org/issues/13085
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57364 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb: default output file is not used when
loading rawdata.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57287 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb (BenchmarkDriver.load): extract loop times
from the loaded results to adjust the results.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57286 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb (show_results): count adjusted result marks
as the name width.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57285 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
I will attempt to reduce garbage in proposed fix
for https://bugs.ruby-lang.org/issues/13085
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57237 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This helps hit inline method caches more frequently. Before this
commit:
```
[aaron@TC ruby (trunk)]$ time ./ruby -v benchmark/bm_vm2_poly_singleton.rb
ruby 2.4.0dev (2016-09-12 trunk 56141) [x86_64-darwin15]
real 0m3.679s
user 0m3.632s
sys 0m0.022s
```
After this commit:
```
[aaron@TC ruby (trunk)]$ time ./ruby -v benchmark/bm_vm2_poly_singleton.rb
ruby 2.4.0dev (2016-09-12 trunk 56141) [x86_64-darwin15]
last_commit=Copy the serial number from the super class to the singleton class
real 0m2.246s
user 0m2.203s
sys 0m0.020s
```
[Feature #12364]
[ruby-core:75425]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56144 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/memory_wrapper.rb: use respond_to? because
member? does not work well.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54062 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
use `--measure-target=[target]'.
Now, we can use the following targets:
* real (default): real time which returns process time in sec.
* peak: peak memory usage (physical memory) in bytes.
* size: last memory usage (physical memory) in bytes.
* benchmark/memory_wrapper.rb: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54060 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The OR-ing itself is bad for a hash function, and shifting 3 bits
left was not enough to undo the damage done by shifting
(RUBY_SPECIAL_SHIFT+3) bits right. Experimentally, shifting 16-17
bits seemed to work well in preparing the number for murmur hash.
Add a few more benchmarks to based on bm_hash_shift to ensure
we don't hurt performance too much with tweaks.
I'm pretty confident about this change and commiting it now;
especially since we're still using Murmur behind it (but perhaps
we can update to a newer hash from Murmur...)
[ruby-core:72028] [Feature #11405]
target 0: a (ruby 2.3.0dev (2015-12-11 trunk 53027) [x86_64-linux]) at "/home/ew/rrrr/b/ruby"
target 1: b (ruby 2.3.0dev (2015-12-11 master 53027) [x86_64-linux]) at "/home/ew/ruby/b/ruby"
benchmark results:
minimum results in each 5 measurements.
Execution time (sec)
name a b
hash_aref_dsym 0.279 0.276
hash_aref_dsym_long 4.951 4.936
hash_aref_fix 0.281 0.283
hash_aref_flo 0.060 0.060
hash_aref_miss 0.409 0.410
hash_aref_str 0.387 0.385
hash_aref_sym 0.275 0.270
hash_aref_sym_long 0.410 0.411
hash_flatten 0.252 0.237
hash_ident_flo 0.035 0.032
hash_ident_num 0.254 0.251
hash_ident_obj 0.252 0.256
hash_ident_str 0.250 0.252
hash_ident_sym 0.259 0.270
hash_keys 0.267 0.267
hash_shift 0.016 0.015
hash_shift_u16 0.074 0.072
hash_shift_u24 0.071 0.071
hash_shift_u32 0.073 0.072
hash_to_proc 0.008 0.008
hash_values 0.263 0.264
Speedup ratio: compare with the result of `a' (greater is better)
name b
hash_aref_dsym 1.009
hash_aref_dsym_long 1.003
hash_aref_fix 0.993
hash_aref_flo 1.001
hash_aref_miss 0.996
hash_aref_str 1.006
hash_aref_sym 1.017
hash_aref_sym_long 0.998
hash_flatten 1.061
hash_ident_flo 1.072
hash_ident_num 1.012
hash_ident_obj 0.987
hash_ident_str 0.993
hash_ident_sym 0.959
hash_keys 0.997
hash_shift 1.036
hash_shift_u16 1.039
hash_shift_u24 1.001
hash_shift_u32 1.017
hash_to_proc 1.001
hash_values 0.995
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53037 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
nil/true/false are special literals just like floats, integers,
literal strings, and symbols. Optimize when statements with
them by using a jump table, too.
target 0: a (ruby 2.3.0dev (2015-12-08 trunk 52928) [x86_64-linux]) at "/home/ew/rrrr/b/ruby"
target 1: b (ruby 2.3.0dev (2015-12-08 master 52928) [x86_64-linux]) at "/home/ew/ruby/b/ruby"
benchmark results:
minimum results in each 5 measurements.
Execution time (sec)
name a b
loop_whileloop2 0.102 0.103
vm2_case_lit* 1.657 0.549
Speedup ratio: compare with the result of `a' (greater is better)
name b
loop_whileloop2 0.988
vm2_case_lit* 3.017
* benchmark/bm_vm2_case_lit.rb: new benchmark
* compile.c (case_when_optimizable_literal): add nil/true/false
* insns.def (opt_case_dispatch): ditto
* vm.c (vm_redefinition_check_flag): ditto
* vm.c (vm_init_redefined_flag): ditto
* vm_core.h: ditto
* object.c (InitVM_Object): define === explicitly for nil/true/false
* test/ruby/test_case.rb (test_deoptimize_nil): new test
* test/ruby/test_optimization.rb (test_opt_case_dispatch): update
(test_eqq): new test
[ruby-core:71923] [Feature #11769]
Original patch by Aaron Patterson <tenderlove@ruby-lang.org>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52931 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/bm_io_nonblock_noex2.rb: new benchmark based
on bm_io_nonblock_noex.rb
* io.c (io_read_nonblock): move documentation to prelude.rb
(io_write_nonblock): ditto
(Init_io): private, internal methods for prelude.rb use only
* prelude.rb (IO#read_nonblock): wrapper + documentation
(IO#write_nonblock): ditto
[ruby-core:71439] [Feature #11339]
rb_scan_args and hash lookups for kwargs in the C API are clumsy and
slow. Instead of improving the C API for performance, use Ruby
instead :)
Implement IO#read_nonblock and IO#write_nonblock in prelude.rb
to avoid argument parsing via rb_scan_args and hash lookups.
This speeds up IO#write_nonblock and IO#read_nonblock benchmarks
in both cases, including the original non-idiomatic case where
the `exception: false' hash is pre-allocated to avoid GC pressure.
Now, writing the kwargs in natural, idiomatic Ruby is fastest.
I've added the noex2 benchmark to show this.
2015-11-12 01:41:12 +0000
target 0: a (ruby 2.3.0dev (2015-11-11 trunk 52540) [x86_64-linux])
target 1: b (ruby 2.3.0dev (2015-11-11 avoid-kwarg-capi 52540)
-----------------------------------------------------------
benchmark results:
minimum results in each 10 measurements.
Execution time (sec)
name a b
io_nonblock_noex 2.508 2.382
io_nonblock_noex2 2.950 1.882
Speedup ratio: compare with the result of `a' (greater is better)
name b
io_nonblock_noex 1.053
io_nonblock_noex2 1.567
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52541 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/bm_require_thread.rb: new benchmark for conflicting
require vs thread. like [Bug #11559]
* prepare_require.rb: new file for preparing above tests.
* prepare_require.rb: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52083 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_resurrect): fix resurrection of short enough to
be embedded but not embedded string.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52073 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/bm_hash_aref_dsym.rb: new benchmark
* benchmark/bm_hash_aref_dsym_long.rb: ditto
* benchmark/bm_hash_aref_fix.rb: ditto
[ruby-core:70129] [Bug #11396] pointed out we need to consider
more cases for hashing.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51435 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Delay hash lookups until we are about to hit an exception. This
gives a minor speedup ratio of 2-3% in the new bm_io_nonblock_noex
benchmark as well as reducing code.
* benchmark/bm_io_nonblock_noex.rb: new benchmark
* ext/openssl/ossl_ssl.c (no_exception_p): new function
(ossl_start_ssl): adjust for no_exception_p
(ossl_ssl_connect): adjust ossl_start_ssl call
(ossl_ssl_connect_nonblock): ditto
(ossl_ssl_accept): ditto
(ossl_ssl_accept_nonblock): ditto
(ossl_ssl_read_internal): adjust for no_exception_p
(ossl_ssl_write_internal): ditto
(ossl_ssl_write): adjust ossl_write_internal call
(ossl_ssl_write_nonblock): ditto
* ext/stringio/stringio.c (strio_read_nonblock):
delay exception check
* io.c (no_exception_p): new function
(io_getpartial): call no_exception_p
(io_readpartial): adjust for io_getpartial
(get_kwargs_exception): remove
(io_read_nonblock): adjust for io_getpartial,
check no_exception_p on EOF
(io_write_nonblock): call no_exception_p
(rb_io_write_nonblock): do not check `exception: false'
(argf_getpartial): adjust for io_getpartial
[ruby-core:69778] [Feature #11318]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51113 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This recovers and improves performance of Marshal.dump/load on
Time objects compared to when we implemented generic ivars
entirely using st_table.
This also recovers some performance on other generic ivar objects,
but does not bring bring Marshal.dump/load performance up to
previous speeds.
benchmark results:
minimum results in each 10 measurements.
Execution time (sec)
name trunk geniv after
marshal_dump_flo 0.343 0.334 0.335
marshal_dump_load_geniv 0.487 0.527 0.495
marshal_dump_load_time 1.262 1.401 1.257
Speedup ratio: compare with the result of `trunk' (greater is better)
name geniv after
marshal_dump_flo 1.026 1.023
marshal_dump_load_geniv 0.925 0.985
marshal_dump_load_time 0.901 1.004
* include/ruby/intern.h (rb_generic_ivar_table): deprecate
* internal.h (rb_attr_delete): declare
* marshal.c (has_ivars): use rb_ivar_foreach
(w_ivar): ditto
(w_object): update for new interface
* time.c (time_mload): use rb_attr_delete
* variable.c (generic_ivar_delete): implement
(rb_ivar_delete): ditto
(rb_attr_delete): ditto
[ruby-core:69323] [Feature #11170]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@50680 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
slowpath for WB.
Before this change bm_vm1_gc_wb_ary.rb tried to check the performance
for WB slowpath (making a reference from oldobj to newobj). However,
from Ruby 2.2, 3 GCs are needed to promote new objects because
only 3 age objects are promted objects.
To compare fastpath and slowpath, introduce new "promoted" version
benchmark.
bm_vm1_gc_wb_ary.rb is for fastpath and
bm_vm1_gc_wb_ary_promoted.rb is for slowpath.
* benchmark/bm_vm1_gc_wb_obj(_promtoed).rb: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49985 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb: add --load-rawdata option to load dumped
rawdata and just output it without actual benchmark.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49869 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb (show_results): dump the rawdata in some
formats, yaml, json, and pretty_inspect.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49868 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb (BenchmarkDriver#adjusted_results):
calculate max width of names at last.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49856 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb (show_results): fix index of results.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49854 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb (show_results): support plain text style
table format output.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49850 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This avoids O(n) on lookups with structs over 10 members.
This also avoids O(n) behavior on all assignments on Struct members.
Members 0..9 still use existing C methods to read in O(1) time
Benchmark results:
vm2_struct_big_aref_hi* 1.305
vm2_struct_big_aref_lo* 1.157
vm2_struct_big_aset* 3.306
vm2_struct_small_aref* 1.015
vm2_struct_small_aset* 3.273
Note: I chose use loading instructions from an array instead of writing
directly to linked-lists in compile.c for ease-of-maintainability. We
may move the method definitions to prelude.rb-like files in the future.
I have also tested this patch with the following patch to disable
the C ref_func methods and ensured the test suite and rubyspec works
--- a/struct.c
+++ b/struct.c
@@ -209,7 +209,7 @@ setup_struct(VALUE nstr, VALUE members)
ID id = SYM2ID(ptr_members[i]);
VALUE off = LONG2NUM(i);
- if (i < N_REF_FUNC) {
+ if (0 && i < N_REF_FUNC) {
rb_define_method_id(nstr, id, ref_func[i], 0);
}
else {
* iseq.c (rb_method_for_self_aref, rb_method_for_self_aset):
new methods to generate bytecode for struct.c
[Feature #10575]
* struct.c (rb_struct_ref, rb_struct_set): remove
(define_aref_method, define_aset_method): new functions
(setup_struct): use new functions
* test/ruby/test_struct.rb: add test for struct >10 members
* benchmark/bm_vm2_struct_big_aref_hi.rb: new benchmark
* benchmark/bm_vm2_struct_big_aref_lo.rb: ditto
* benchmark/bm_vm2_struct_big_aset.rb: ditto
* benchmark/bm_vm2_struct_small_aref.rb: ditto
* benchmark/bm_vm2_struct_small_aset.rb: ditto
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48748 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Dynamic symbols hash more slowly because they need extra method
dispatch in rb_any_hash. I am not sure if dynamic symbols are
a realistic use case as hash keys, so this commit only
restores performance when comparing against versions of Ruby
which lack dsyms.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47854 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/bm_app_aobench.rb: update outdated links to the
original program. [ruby-dev:48550] [Feature #10247]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47603 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* proc.c (rb_proc_alloc): inline and move to vm.c
(rb_proc_wrap): new wrapper function used by rb_proc_alloc
(proc_dup): simplify alloc + copy + wrap operation
[ruby-core:64994]
* vm.c (rb_proc_alloc): new inline function
(rb_vm_make_proc): call rb_proc_alloc
* vm_core.h: remove rb_proc_alloc, add rb_proc_wrap
* benchmark/bm_vm2_newlambda.rb: short test to show difference
First we allocate and populate an rb_proc_t struct inline to avoid
unnecessary zeroing of the large struct. Inlining speeds up callers as
this takes many parameters to ensure correctness. We then call the new
rb_proc_wrap function to create the object.
rb_proc_wrap - wraps a rb_proc_t pointer as a Ruby object, but
we only use it inside rb_proc_alloc. We must call this before
the compiler may clobber VALUE parameters passed to rb_proc_alloc.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47562 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This program is described closely in "Understanding Computation"
chapter 6 by Tom Stuart. <http://computationbook.com/>
Japanese translation will be published soon.
<http://www.oreilly.co.jp/books/9784873116976/>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47447 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
A doubly-linked list for tracking living threads guarantees
constant-time insert/delete performance with no corner cases of a
hash table. I chose this ccan implementation of doubly-linked
lists over the BSD sys/queue.h implementation since:
1) insertion and removal are both branchless
2) locality is improved if a struct may be a member of multiple lists
(0002 patch in Feature 9632 will introduce a secondary list
for waiting FDs)
This also increases cache locality during iteration: improving
performance in a new IO#close benchmark with many sleeping threads
while still scanning the same number of threads.
vm_thread_close 1.762
* vm_core.h (rb_vm_t): list_head and counter for living_threads
(rb_thread_t): vmlt_node for living_threads linkage
(rb_vm_living_threads_init): new function wrapper
(rb_vm_living_threads_insert): ditto
(rb_vm_living_threads_remove): ditto
* vm.c (rb_vm_living_threads_foreach): new function wrapper
* thread.c (terminate_i, thread_start_func_2, thread_create_core,
thread_fd_close_i, thread_fd_close): update to use new APIs
* vm.c (vm_mark_each_thread_func, rb_vm_mark, ruby_vm_destruct,
vm_memsize, vm_init2, Init_VM): ditto
* vm_trace.c (clear_trace_func_i, rb_clear_trace_func): ditto
* benchmark/bm_vm_thread_close.rb: added to show improvement
* ccan/build_assert/build_assert.h: added as a dependency of list.h
* ccan/check_type/check_type.h: ditto
* ccan/container_of/container_of.h: ditto
* ccan/licenses/BSD-MIT: ditto
* ccan/licenses/CC0: ditto
* ccan/str/str.h: ditto (stripped of unused macros)
* ccan/list/list.h: ditto
* common.mk: add CCAN_LIST_INCLUDES
[ruby-core:61871][Feature 9632 (part 1)]
Apologies for the size of this commit, but I think a good
doubly-linked list will be useful for future features, too.
This may be used to add ordering to a container_of-based hash
table to preserve compatibility if required (e.g. feature 9614).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@45913 b2dd03c8-39d4-4d8f-98ff-823fe69b080e