The OR-ing itself is bad for a hash function, and shifting 3 bits
left was not enough to undo the damage done by shifting
(RUBY_SPECIAL_SHIFT+3) bits right. Experimentally, shifting 16-17
bits seemed to work well in preparing the number for murmur hash.
Add a few more benchmarks to based on bm_hash_shift to ensure
we don't hurt performance too much with tweaks.
I'm pretty confident about this change and commiting it now;
especially since we're still using Murmur behind it (but perhaps
we can update to a newer hash from Murmur...)
[ruby-core:72028] [Feature #11405]
target 0: a (ruby 2.3.0dev (2015-12-11 trunk 53027) [x86_64-linux]) at "/home/ew/rrrr/b/ruby"
target 1: b (ruby 2.3.0dev (2015-12-11 master 53027) [x86_64-linux]) at "/home/ew/ruby/b/ruby"
benchmark results:
minimum results in each 5 measurements.
Execution time (sec)
name a b
hash_aref_dsym 0.279 0.276
hash_aref_dsym_long 4.951 4.936
hash_aref_fix 0.281 0.283
hash_aref_flo 0.060 0.060
hash_aref_miss 0.409 0.410
hash_aref_str 0.387 0.385
hash_aref_sym 0.275 0.270
hash_aref_sym_long 0.410 0.411
hash_flatten 0.252 0.237
hash_ident_flo 0.035 0.032
hash_ident_num 0.254 0.251
hash_ident_obj 0.252 0.256
hash_ident_str 0.250 0.252
hash_ident_sym 0.259 0.270
hash_keys 0.267 0.267
hash_shift 0.016 0.015
hash_shift_u16 0.074 0.072
hash_shift_u24 0.071 0.071
hash_shift_u32 0.073 0.072
hash_to_proc 0.008 0.008
hash_values 0.263 0.264
Speedup ratio: compare with the result of `a' (greater is better)
name b
hash_aref_dsym 1.009
hash_aref_dsym_long 1.003
hash_aref_fix 0.993
hash_aref_flo 1.001
hash_aref_miss 0.996
hash_aref_str 1.006
hash_aref_sym 1.017
hash_aref_sym_long 0.998
hash_flatten 1.061
hash_ident_flo 1.072
hash_ident_num 1.012
hash_ident_obj 0.987
hash_ident_str 0.993
hash_ident_sym 0.959
hash_keys 0.997
hash_shift 1.036
hash_shift_u16 1.039
hash_shift_u24 1.001
hash_shift_u32 1.017
hash_to_proc 1.001
hash_values 0.995
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53037 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
nil/true/false are special literals just like floats, integers,
literal strings, and symbols. Optimize when statements with
them by using a jump table, too.
target 0: a (ruby 2.3.0dev (2015-12-08 trunk 52928) [x86_64-linux]) at "/home/ew/rrrr/b/ruby"
target 1: b (ruby 2.3.0dev (2015-12-08 master 52928) [x86_64-linux]) at "/home/ew/ruby/b/ruby"
benchmark results:
minimum results in each 5 measurements.
Execution time (sec)
name a b
loop_whileloop2 0.102 0.103
vm2_case_lit* 1.657 0.549
Speedup ratio: compare with the result of `a' (greater is better)
name b
loop_whileloop2 0.988
vm2_case_lit* 3.017
* benchmark/bm_vm2_case_lit.rb: new benchmark
* compile.c (case_when_optimizable_literal): add nil/true/false
* insns.def (opt_case_dispatch): ditto
* vm.c (vm_redefinition_check_flag): ditto
* vm.c (vm_init_redefined_flag): ditto
* vm_core.h: ditto
* object.c (InitVM_Object): define === explicitly for nil/true/false
* test/ruby/test_case.rb (test_deoptimize_nil): new test
* test/ruby/test_optimization.rb (test_opt_case_dispatch): update
(test_eqq): new test
[ruby-core:71923] [Feature #11769]
Original patch by Aaron Patterson <tenderlove@ruby-lang.org>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52931 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/bm_io_nonblock_noex2.rb: new benchmark based
on bm_io_nonblock_noex.rb
* io.c (io_read_nonblock): move documentation to prelude.rb
(io_write_nonblock): ditto
(Init_io): private, internal methods for prelude.rb use only
* prelude.rb (IO#read_nonblock): wrapper + documentation
(IO#write_nonblock): ditto
[ruby-core:71439] [Feature #11339]
rb_scan_args and hash lookups for kwargs in the C API are clumsy and
slow. Instead of improving the C API for performance, use Ruby
instead :)
Implement IO#read_nonblock and IO#write_nonblock in prelude.rb
to avoid argument parsing via rb_scan_args and hash lookups.
This speeds up IO#write_nonblock and IO#read_nonblock benchmarks
in both cases, including the original non-idiomatic case where
the `exception: false' hash is pre-allocated to avoid GC pressure.
Now, writing the kwargs in natural, idiomatic Ruby is fastest.
I've added the noex2 benchmark to show this.
2015-11-12 01:41:12 +0000
target 0: a (ruby 2.3.0dev (2015-11-11 trunk 52540) [x86_64-linux])
target 1: b (ruby 2.3.0dev (2015-11-11 avoid-kwarg-capi 52540)
-----------------------------------------------------------
benchmark results:
minimum results in each 10 measurements.
Execution time (sec)
name a b
io_nonblock_noex 2.508 2.382
io_nonblock_noex2 2.950 1.882
Speedup ratio: compare with the result of `a' (greater is better)
name b
io_nonblock_noex 1.053
io_nonblock_noex2 1.567
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52541 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/bm_require_thread.rb: new benchmark for conflicting
require vs thread. like [Bug #11559]
* prepare_require.rb: new file for preparing above tests.
* prepare_require.rb: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52083 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_resurrect): fix resurrection of short enough to
be embedded but not embedded string.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52073 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/bm_hash_aref_dsym.rb: new benchmark
* benchmark/bm_hash_aref_dsym_long.rb: ditto
* benchmark/bm_hash_aref_fix.rb: ditto
[ruby-core:70129] [Bug #11396] pointed out we need to consider
more cases for hashing.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51435 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Delay hash lookups until we are about to hit an exception. This
gives a minor speedup ratio of 2-3% in the new bm_io_nonblock_noex
benchmark as well as reducing code.
* benchmark/bm_io_nonblock_noex.rb: new benchmark
* ext/openssl/ossl_ssl.c (no_exception_p): new function
(ossl_start_ssl): adjust for no_exception_p
(ossl_ssl_connect): adjust ossl_start_ssl call
(ossl_ssl_connect_nonblock): ditto
(ossl_ssl_accept): ditto
(ossl_ssl_accept_nonblock): ditto
(ossl_ssl_read_internal): adjust for no_exception_p
(ossl_ssl_write_internal): ditto
(ossl_ssl_write): adjust ossl_write_internal call
(ossl_ssl_write_nonblock): ditto
* ext/stringio/stringio.c (strio_read_nonblock):
delay exception check
* io.c (no_exception_p): new function
(io_getpartial): call no_exception_p
(io_readpartial): adjust for io_getpartial
(get_kwargs_exception): remove
(io_read_nonblock): adjust for io_getpartial,
check no_exception_p on EOF
(io_write_nonblock): call no_exception_p
(rb_io_write_nonblock): do not check `exception: false'
(argf_getpartial): adjust for io_getpartial
[ruby-core:69778] [Feature #11318]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51113 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This recovers and improves performance of Marshal.dump/load on
Time objects compared to when we implemented generic ivars
entirely using st_table.
This also recovers some performance on other generic ivar objects,
but does not bring bring Marshal.dump/load performance up to
previous speeds.
benchmark results:
minimum results in each 10 measurements.
Execution time (sec)
name trunk geniv after
marshal_dump_flo 0.343 0.334 0.335
marshal_dump_load_geniv 0.487 0.527 0.495
marshal_dump_load_time 1.262 1.401 1.257
Speedup ratio: compare with the result of `trunk' (greater is better)
name geniv after
marshal_dump_flo 1.026 1.023
marshal_dump_load_geniv 0.925 0.985
marshal_dump_load_time 0.901 1.004
* include/ruby/intern.h (rb_generic_ivar_table): deprecate
* internal.h (rb_attr_delete): declare
* marshal.c (has_ivars): use rb_ivar_foreach
(w_ivar): ditto
(w_object): update for new interface
* time.c (time_mload): use rb_attr_delete
* variable.c (generic_ivar_delete): implement
(rb_ivar_delete): ditto
(rb_attr_delete): ditto
[ruby-core:69323] [Feature #11170]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@50680 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
slowpath for WB.
Before this change bm_vm1_gc_wb_ary.rb tried to check the performance
for WB slowpath (making a reference from oldobj to newobj). However,
from Ruby 2.2, 3 GCs are needed to promote new objects because
only 3 age objects are promted objects.
To compare fastpath and slowpath, introduce new "promoted" version
benchmark.
bm_vm1_gc_wb_ary.rb is for fastpath and
bm_vm1_gc_wb_ary_promoted.rb is for slowpath.
* benchmark/bm_vm1_gc_wb_obj(_promtoed).rb: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49985 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb: add --load-rawdata option to load dumped
rawdata and just output it without actual benchmark.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49869 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb (show_results): dump the rawdata in some
formats, yaml, json, and pretty_inspect.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49868 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb (BenchmarkDriver#adjusted_results):
calculate max width of names at last.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49856 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb (show_results): fix index of results.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49854 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver.rb (show_results): support plain text style
table format output.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49850 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This avoids O(n) on lookups with structs over 10 members.
This also avoids O(n) behavior on all assignments on Struct members.
Members 0..9 still use existing C methods to read in O(1) time
Benchmark results:
vm2_struct_big_aref_hi* 1.305
vm2_struct_big_aref_lo* 1.157
vm2_struct_big_aset* 3.306
vm2_struct_small_aref* 1.015
vm2_struct_small_aset* 3.273
Note: I chose use loading instructions from an array instead of writing
directly to linked-lists in compile.c for ease-of-maintainability. We
may move the method definitions to prelude.rb-like files in the future.
I have also tested this patch with the following patch to disable
the C ref_func methods and ensured the test suite and rubyspec works
--- a/struct.c
+++ b/struct.c
@@ -209,7 +209,7 @@ setup_struct(VALUE nstr, VALUE members)
ID id = SYM2ID(ptr_members[i]);
VALUE off = LONG2NUM(i);
- if (i < N_REF_FUNC) {
+ if (0 && i < N_REF_FUNC) {
rb_define_method_id(nstr, id, ref_func[i], 0);
}
else {
* iseq.c (rb_method_for_self_aref, rb_method_for_self_aset):
new methods to generate bytecode for struct.c
[Feature #10575]
* struct.c (rb_struct_ref, rb_struct_set): remove
(define_aref_method, define_aset_method): new functions
(setup_struct): use new functions
* test/ruby/test_struct.rb: add test for struct >10 members
* benchmark/bm_vm2_struct_big_aref_hi.rb: new benchmark
* benchmark/bm_vm2_struct_big_aref_lo.rb: ditto
* benchmark/bm_vm2_struct_big_aset.rb: ditto
* benchmark/bm_vm2_struct_small_aref.rb: ditto
* benchmark/bm_vm2_struct_small_aset.rb: ditto
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48748 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Dynamic symbols hash more slowly because they need extra method
dispatch in rb_any_hash. I am not sure if dynamic symbols are
a realistic use case as hash keys, so this commit only
restores performance when comparing against versions of Ruby
which lack dsyms.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47854 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/bm_app_aobench.rb: update outdated links to the
original program. [ruby-dev:48550] [Feature #10247]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47603 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* proc.c (rb_proc_alloc): inline and move to vm.c
(rb_proc_wrap): new wrapper function used by rb_proc_alloc
(proc_dup): simplify alloc + copy + wrap operation
[ruby-core:64994]
* vm.c (rb_proc_alloc): new inline function
(rb_vm_make_proc): call rb_proc_alloc
* vm_core.h: remove rb_proc_alloc, add rb_proc_wrap
* benchmark/bm_vm2_newlambda.rb: short test to show difference
First we allocate and populate an rb_proc_t struct inline to avoid
unnecessary zeroing of the large struct. Inlining speeds up callers as
this takes many parameters to ensure correctness. We then call the new
rb_proc_wrap function to create the object.
rb_proc_wrap - wraps a rb_proc_t pointer as a Ruby object, but
we only use it inside rb_proc_alloc. We must call this before
the compiler may clobber VALUE parameters passed to rb_proc_alloc.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47562 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This program is described closely in "Understanding Computation"
chapter 6 by Tom Stuart. <http://computationbook.com/>
Japanese translation will be published soon.
<http://www.oreilly.co.jp/books/9784873116976/>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47447 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
A doubly-linked list for tracking living threads guarantees
constant-time insert/delete performance with no corner cases of a
hash table. I chose this ccan implementation of doubly-linked
lists over the BSD sys/queue.h implementation since:
1) insertion and removal are both branchless
2) locality is improved if a struct may be a member of multiple lists
(0002 patch in Feature 9632 will introduce a secondary list
for waiting FDs)
This also increases cache locality during iteration: improving
performance in a new IO#close benchmark with many sleeping threads
while still scanning the same number of threads.
vm_thread_close 1.762
* vm_core.h (rb_vm_t): list_head and counter for living_threads
(rb_thread_t): vmlt_node for living_threads linkage
(rb_vm_living_threads_init): new function wrapper
(rb_vm_living_threads_insert): ditto
(rb_vm_living_threads_remove): ditto
* vm.c (rb_vm_living_threads_foreach): new function wrapper
* thread.c (terminate_i, thread_start_func_2, thread_create_core,
thread_fd_close_i, thread_fd_close): update to use new APIs
* vm.c (vm_mark_each_thread_func, rb_vm_mark, ruby_vm_destruct,
vm_memsize, vm_init2, Init_VM): ditto
* vm_trace.c (clear_trace_func_i, rb_clear_trace_func): ditto
* benchmark/bm_vm_thread_close.rb: added to show improvement
* ccan/build_assert/build_assert.h: added as a dependency of list.h
* ccan/check_type/check_type.h: ditto
* ccan/container_of/container_of.h: ditto
* ccan/licenses/BSD-MIT: ditto
* ccan/licenses/CC0: ditto
* ccan/str/str.h: ditto (stripped of unused macros)
* ccan/list/list.h: ditto
* common.mk: add CCAN_LIST_INCLUDES
[ruby-core:61871][Feature 9632 (part 1)]
Apologies for the size of this commit, but I think a good
doubly-linked list will be useful for future features, too.
This may be used to add ordering to a container_of-based hash
table to preserve compatibility if required (e.g. feature 9614).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@45913 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* st.c (hash_pos): use bitwise AND to avoid slow modulo op
(new_size): power-of-two sizes for hash_pos change
(st_numhash): adjust for common keys due to lack of prime modulo
[Feature #9425]
* hash.c (rb_any_hash): right shift for symbols
* benchmark/bm_hash_aref_miss.rb: added to show improvement
* benchmark/bm_hash_aref_sym_long.rb: ditto
* benchmark/bm_hash_aref_str.rb: ditto
* benchmark/bm_hash_aref_sym.rb: ditto
* benchmark/bm_hash_ident_num.rb: added to prevent regression
* benchmark/bm_hash_ident_obj.rb: ditto
* benchmark/bm_hash_ident_str.rb: ditto
* benchmark/bm_hash_ident_sym.rb: ditto
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@45384 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* benchmark/driver: avoid large alloc in driver process
[ruby-core:59869] [Bug #9430]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@44772 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
than 2.0.0p353.
* benchmark/bm_hash_keys.rb: added. r43896 is about 5 times faster
than 2.0.0p353.
* benchmark/bm_hash_values.rb: added. r43896 is about 5 times faster
than 2.0.0p353.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@43898 b2dd03c8-39d4-4d8f-98ff-823fe69b080e