From the documentation of rb_obj_hash:
> Certain core classes such as Integer use built-in hash calculations and
> do not call the #hash method when used as a hash key.
So if you override, say, Integer#hash it won't be used from rb_hash_aref
and similar. This avoids method lookups in many common cases.
This commit uses the same optimization in rb_hash, a method used
internally and in the C API to get the hash value of an object. Usually
this is used to build the hash of an object based on its elements.
Previously it would always do a method lookup for 'hash'.
This is primarily intended to speed up hashing of Arrays and Hashes,
which call rb_hash for each element.
compare-ruby: ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]
built-ruby: ruby 3.1.0dev (2021-09-29T02:13:24Z fast_hash d670bf88b2) [x86_64-linux]
# Iteration per second (i/s)
| |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|hash_aref_array | 1.008| 1.769|
| | -| 1.76x|
* As the "doc/" prefix is specified by the `--page-dir` option,
remove from the rdoc references.
* Refer to the original .rdoc instead of the converted .html.
Instead of looking for Object::ENV (which can be overwritten),
directly look for the envtbl variable. As that is static in hash.c,
and the lookup code is in process.c, add a couple non-static
functions that will return envtbl (or envtbl#to_hash).
Fixes [Bug #18164]
We don't need to increment/decrement iteration level for frozen Hash
because frozen Hash can't be modified. We can assume that nobody
changes the target Hash while calling #each family.
How to reproduce:
a = {}
100.times do |i|
a[i] = true
end
Ractor.make_shareable(a)
4.times.collect do
Ractor.new(a) do |b|
100.times do
b.each_value do
end
end
end
end.each(&:take)
Example output:
internal:ractor>:267: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
#<Thread:0x00007fcfb087bb30 run> terminated with exception (report_on_exception is true):
#<Thread:0x00007fcfb087b8d8 run> terminated with exception (report_on_exception is true):
#<Thread:0x00007fcfb088d678 run> terminated with exception (report_on_exception is true):
#<Thread:0x00007fcfb087bd88 run> terminated with exception (report_on_exception is true):
/tmp/h.rb:10:in `each_value'/tmp/h.rb:10:in `each_value': : /tmp/h.rb:10:in `each_value'no implicit conversion from nil to integer/tmp/h.rb:10:in `each_value'no implicit conversion from nil to integer (: : (TypeErrorTypeError)no implicit conversion from nil to integer)no implicit conversion from nil to integer (
(TypeErrorTypeError from /tmp/h.rb:10:in `block (3 levels) in <main>'
from /tmp/h.rb:10:in `block (3 levels) in <main>'
)) from /tmp/h.rb:9:in `times'
from /tmp/h.rb:9:in `times'
from /tmp/h.rb:9:in `block (2 levels) in <main>'
from /tmp/h.rb:10:in `block (3 levels) in <main>'
from /tmp/h.rb:10:in `block (3 levels) in <main>'
from /tmp/h.rb:9:in `block (2 levels) in <main>'
from /tmp/h.rb:9:in `times'
from /tmp/h.rb:9:in `times'
from /tmp/h.rb:9:in `block (2 levels) in <main>'
from /tmp/h.rb:9:in `block (2 levels) in <main>'
<internal:ractor>:694:in `take': thrown by remote Ractor. (Ractor::RemoteError)
from /tmp/h.rb:14:in `each'
from /tmp/h.rb:14:in `<main>'
/tmp/h.rb:10:in `each_value': no implicit conversion from nil to integer (TypeError)
from /tmp/h.rb:10:in `block (3 levels) in <main>'
from /tmp/h.rb:9:in `times'
from /tmp/h.rb:9:in `block (2 levels) in <main>'
This makes the compare_by_identity setting always copied
for the following methods:
* except
* merge
* reject
* select
* slice
* transform_values
Some of these methods did not copy the setting, or only
copied the setting if the receiver was not empty.
Fixes [Bug #17757]
Co-authored-by: Kenichi Kamiya <kachick1@gmail.com>
ENV.dup returned a plain Object, since all of ENV's behavior is
defined in ENV's singleton class. So using dup makes no sense.
ENV.clone works and is used in some gems, but it doesn't do what
the user expects, since modifying ENV.clone also modifies ENV.
Add a deprecation warning pointing the user to use ENV.to_h
instead.
This also undefines some private initialize* methods in ENV,
since they are not needed.
Fixes [Bug #17767]
refs:
* https://bugs.ruby-lang.org/issues/17736
* https://github.com/ruby/ruby/pull/4296
This commit aims to cover following methods
* Hash#select!
* Hash#filter!
* Hash#keep_if
* Hash#reject!
* Hash#delete_if
I think these are not all.
---
* Ensure the receiver is modifiable or not
* Assert the receiver is not modified
* Windows: Read ENV names and values as UTF-8 encoded Strings
Implements issue #12650: fix https://bugs.ruby-lang.org/issues/12650
This also removes the special encoding for ENV['PATH'] and some
complexity in the code that is unnecessary now.
* Windows: Improve readablity of getenv() encoding
getenv() did use the expected codepage as an implicit parameter of the macro.
This is mis-leading since include/ruby/win32.h has a different definition.
Using the "cp" variable explicit (like the other function calls) makes it
more readable and consistent.
* Windows: Change external C-API macros getenv() and execv() to use UTF-8
They used to process and return strings with locale encoding,
but since all ruby-internal spawn and environment functions use UTF-8,
it makes sense to change the C-API equally.
While it is expected that all environment keys are unique, that is
not enforced. It is possible by manipulating environ directly you
can call a process with an environment with duplicate keys. If
ENV.replace was passed a hash with a key where environ had a
duplicate for that key, ENV.replace would end up deleting the key
from environ.
The fix in this case is to not assume that the environment key
list has unique keys, and continue processing the entire key
list in keylist_delete.
Fixes [Bug #17254]
Adds a full discussion of #dig, along with links from Array, Hash, Struct, and OpenStruct.
CSV::Table and CSV::Row are over in ruby/csv. I'll get to them soon.
The art to the thing is to figure out how much (or how little) to say at each #dig.
* Per @nobu review
* [CI skip] Enhance rdoc intro for Hash
* Tweak call-seq for Hash.new
* Tweak call-seq for Hash.new
* Minor corrections
* Respond to review
* Respond to review
* Respond to review
* Respond to review
* Fix chain exampmle
* Response to review
Previously, passing a keyword splat to a method always allocated
a hash on the caller side, and accepting arbitrary keywords in
a method allocated a separate hash on the callee side. Passing
explicit keywords to a method that accepted a keyword splat
did not allocate a hash on the caller side, but resulted in two
hashes allocated on the callee side.
This commit makes passing a single keyword splat to a method not
allocate a hash on the caller side. Passing multiple keyword
splats or a mix of explicit keywords and a keyword splat still
generates a hash on the caller side. On the callee side,
if arbitrary keywords are not accepted, it does not allocate a
hash. If arbitrary keywords are accepted, it will allocate a
hash, but this commit uses a callinfo flag to indicate whether
the caller already allocated a hash, and if so, the callee can
use the passed hash without duplicating it. So this commit
should make it so that a maximum of a single hash is allocated
during method calls.
To set the callinfo flag appropriately, method call argument
compilation checks if only a single keyword splat is given.
If only one keyword splat is given, the VM_CALL_KW_SPLAT_MUT
callinfo flag is not set, since in that case the keyword
splat is passed directly and not mutable. If more than one
splat is used, a new hash needs to be generated on the caller
side, and in that case the callinfo flag is set, indicating
the keyword splat is mutable by the callee.
In compile_hash, used for both hash and keyword argument
compilation, if compiling keyword arguments and only a
single keyword splat is used, pass the argument directly.
On the caller side, in vm_args.c, the callinfo flag needs to
be recognized and handled. Because the keyword splat
argument may not be a hash, it needs to be converted to a
hash first if not. Then, unless the callinfo flag is set,
the hash needs to be duplicated. The temporary copy of the
callinfo flag, kw_flag, is updated if a hash was duplicated,
to prevent the need to duplicate it again. If we are
converting to a hash or duplicating a hash, we need to update
the argument array, which can including duplicating the
positional splat array if one was passed. CALLER_SETUP_ARG
and a couple other places needs to be modified to handle
similar issues for other types of calls.
This includes fairly comprehensive tests for different ways
keywords are handled internally, checking that you get equal
results but that keyword splats on the caller side result in
distinct objects for keyword rest parameters.
Included are benchmarks for keyword argument calls.
Brief results when compiled without optimization:
def kw(a: 1) a end
def kws(**kw) kw end
h = {a: 1}
kw(a: 1) # about same
kw(**h) # 2.37x faster
kws(a: 1) # 1.30x faster
kws(**h) # 2.19x faster
kw(a: 1, **h) # 1.03x slower
kw(**h, **h) # about same
kws(a: 1, **h) # 1.16x faster
kws(**h, **h) # 1.14x faster
As a semantics, Hash#each yields a 2-element array (pairs of keys and
values). So, `{ a: 1 }.each(&->(k, v) { })` should raise an exception
due to lambda's arity check.
However, the optimization that avoids Array allocation by using
rb_yield_values for blocks whose arity is more than 1 (introduced at
b9d2960337 and some commits), seemed to
overlook the lambda case, and wrongly allowed the code above to work.
This change experimentally attempts to make it strict; now the code
above raises an ArgumentError. This is an incompatible change; if the
compatibility issue is bigger than our expectation, it may be reverted
(until Ruby 3.0 release).
[Bug #12706]
ar_table can be converted to st_table just after `ar_do_hash()`
function which calls `#hash` method. We need to check
the representation to detect this mutation.
[Bug #16676]
It was found that a feature to check and add ruby2_keywords flag to an
existing Hash is needed when arguments are serialized and deserialized.
It is possible to do the same without explicit APIs, but it would be
good to provide them as a core feature.
https://github.com/rails/rails/pull/38105#discussion_r361863767
Hash.ruby2_keywords_hash?(hash) checks if hash is flagged or not.
Hash.ruby2_keywords_hash(hash) returns a duplicated hash that has a
ruby2_keywords flag,
[Bug #16486]
ar_talbe (Hash representation for <=8 size) can use transient heap
and the memory area can move. So we need to restore `pair' ptr after
`func` call (which can run any programs) because of moving.
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead. This would significantly
speed up incremental builds.
We take the following inclusion order in this changeset:
1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very
first thing among everything).
2. RUBY_EXTCONF_H if any.
3. Standard C headers, sorted alphabetically.
4. Other system headers, maybe guarded by #ifdef
5. Everything else, sorted alphabetically.
Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
Reduce macros to make them inline functions, as well as mark
MJIT_FUNC_EXPORTED functions explicitly as such.
Definition of ar_hint_t is simplified. This has been the only possible
definition so far.
Akatsuki reported ENV['TZ'] = 'UTC' improved 7x-8x faster on following code.
t = Time.now; 100000.times { Time.new(2019) }; Time.now - t
https://hackerslab.aktsk.jp/2019/12/01/141551
commit 4bc1669127(reduce tzset) dramatically improved this situation. But still,
TZ=UTC is faster than default.
This patch removs unnecessary tzset() call completely.
Performance check
----------------------
test program: t = Time.now; 100000.times { Time.new(2019) }; Time.now - t
before: 0.387sec
before(w/ TZ): 0.197sec
after: 0.162sec
after(w/ TZ): 0.165sec
OK. Now, Time creation 2x faster *and* TZ=UTC doesn't improve anything.
We can forget this hack completely. :)
Side note:
This patch slightly changes Time.new(t) behavior implicitly. Before this patch, it might changes
default timezone implicitly. But after this patch, it doesn't. You need to reset TZ
(I mean ENV['TZ'] = nil) explicitly.
But I don't think this is big impact. Don't try to change /etc/localtime on runtime.
Side note2: following test might be useful for testing "ENV['TZ'] = nil".
-----------------------------------------
% cat <<'End' | sudo sh -s
rm -f /etc/localtime-; cp -a /etc/localtime /etc/localtime-
rm /etc/localtime; ln -s /usr/share/zoneinfo/Asia/Tokyo /etc/localtime
./ruby -e '
p Time.new(2000).zone # JST
File.unlink("/etc/localtime"); File.symlink("/usr/share/zoneinfo/America/Los_Angeles", "/etc/localtime")
p Time.new(2000).zone # JST (ruby does not follow /etc/localtime modification automatically)
ENV["TZ"] = nil
p Time.new(2000).zone # PST (ruby detect /etc/localtime modification)
'
rm /etc/localtime; cp -a /etc/localtime- /etc/localtime; rm /etc/localtime-
End
These functions are used from within a compilation unit so we can
make them static, for better binary size. This changeset reduces
the size of generated ruby binary from 26,590,128 bytes to
26,584,472 bytes on my macihne.
This removes the related tests, and puts the related specs behind
version guards. This affects all code in lib, including some
libraries that may want to support older versions of Ruby.
This removes the security features added by $SAFE = 1, and warns for access
or modification of $SAFE from Ruby-level, as well as warning when calling
all public C functions related to $SAFE.
This modifies some internal functions that took a safe level argument
to no longer take the argument.
rb_require_safe now warns, rb_require_string has been added as a
version that takes a VALUE and does not warn.
One public C function that still takes a safe level argument and that
this doesn't warn for is rb_eval_cmd. We may want to consider
adding an alternative method that does not take a safe level argument,
and warn for rb_eval_cmd.
Looking at the list of symbols inside of libruby-static.a, I found
hundreds of functions that are defined, but used from nowhere.
There can be reasons for each of them (e.g. some functions are
specific to some platform, some are useful when debugging, etc).
However it seems the functions deleted here exist for no reason.
This changeset reduces the size of ruby binary from 26,671,456
bytes to 26,592,864 bytes on my machine.
This changes object_id from being based on the objects location in
memory (or a nearby memory location in the case of a conflict) to be
based on an always increasing number.
This number is a Ruby Integer which allows it to overflow the size of a
pointer without issue (very unlikely to happen in real programs
especially on 64-bit, but a nice guarantee).
This changes obj_to_id_tbl and id_to_obj_tbl to both be maps of Ruby
objects to Ruby objects (previously they were Ruby object to C integer)
which simplifies updating them after compaction as we can run them
through gc_update_table_refs.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
This changes object_id from being based on the objects location in
memory (or a nearby memory location in the case of a conflict) to be
based on an always increasing number.
This number is a Ruby Integer which allows it to overflow the size of a
pointer without issue (very unlikely to happen in real programs
especially on 64-bit, but a nice guarantee).
This changes obj_to_id_tbl and id_to_obj_tbl to both be maps of Ruby
objects to Ruby objects (previously they were Ruby object to C integer)
which simplifies updating them after compaction as we can run them
through gc_update_table_refs.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>