According to the C99 specification section 7.20.3.2 paragraph 2:
> If ptr is a null pointer, no action occurs.
So we do not need to check that the pointer is a null pointer.
st_copy allocates a st_table, which is not needed for hashes since it is
allocated by VWA and embedded, so this causes a memory leak.
The following script demonstrates the issue:
```ruby
20.times do
100_000.times do
{a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8, i: 9}
end
puts `ps -o rss= -p #{$$}`
end
```
With VWA, AR hashes are much larger than ST hashes. Hash#replace
attempts to directly copy the contents of AR hashes into ST hashes so
there will be memory corruption caused by writing past the end of memory.
This commit changes it so that if a ST hash is being replaced with an AR
hash it will insert each element into the ST hash.
[Feature #19236]
In Ruby 3.3, `Hash.new` shall print a deprecation warning if keyword arguments
are passed instead of treating them as an implicit positional Hash.
This will allow to safely introduce a `capacity` keyword argument in 3.4
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
The documentation states it returns a copy of self with nil value
entries removed. However, the previous behavior was creating a
plain new hash with non-nil values copied into it. This change
aligns the behavior with the documentation.
Fixes [Bug #19113]
This was already copied for non-empty hashes. As Hash.ruby2_keywords_hash
copies default values, it should also copy the compare_by_identity flag.
Partially Fixes [Bug #19113]
It wasn't copied for empty hashes, and Hash.[] doesn't copy the
default value, so copying the compare_by_identity flag does not
make sense.
Partially Fixes [Bug #19113]
Because of the function pointer, it's hard to figure out what hash
functions could be used in Hash objects when st_lookup is used.
Having this assertion makes it easier to understand what
hash_stlike_lookup could possibly do. (AR uses only rb_any_hash)
For example, this clarifies that hash_stlike_lookup never calls a #hash
method when a key is T_STRING or T_SYMBOL.
We should always have a T_HASH here, so we can use FL_TEST_RAW to avoid
checking whether we may have an immediate value.
I expect this to be a very small performance improvement (perf stat
./miniruby benchmark/hash_aref_miss.rb shows a ~1% improvement). It also
removes 9 instructions from rb_hash_default_value on x86_64.
On a hash miss we need to call default if it is redefined in order to
return the default value to be used. Previously we checked this with
rb_method_basic_definition_p, which avoids the method call but requires
a method lookup.
This commit replaces the previous check with BASIC_OP_UNREDEFINED_P and
a new BOP_DEFAULT. We still need to fall back to
rb_method_basic_definition_p when called on a subclasss of hash.
| |compare-ruby|built-ruby|
|:---------------|-----------:|---------:|
|hash_aref_miss | 2.692| 3.531|
| | -| 1.31x|
Co-authored-by: Daniel Colson <danieljamescolson@gmail.com>
Co-authored-by: "Ian C. Anderson" <ian@iancanderson.com>
Co-authored-by: Jack McCracken <me@jackmc.xyz>
One year ago, the former method has been deprecated while the latter
has become an error. Then the 3.1 released, it is enough time to make
also the former an error.
rb_ary_tmp_new suggests that the array is temporary in some way, but
that's not true, it just creates an array that's hidden and not on the
transient heap. This commit renames it to rb_ary_hidden_new.
Previously, because opt_aref and opt_aset don't push a frame, when they
would call rb_hash to determine the hash value of the key, the initial
level of recursion would incorrectly use the method id at the top of the
stack instead of "hash".
This commit replaces rb_exec_recursive_outer with
rb_exec_recursive_outer_mid, which takes an explicit method id, so that
we can make the hash calculation behave consistently.
rb_exec_recursive_outer was documented as being internal, so I believe
this should be okay to change.
[Feature #18683]
This allows parsers and similar libraries to create Hashes of
a certain capacity in advance. It's useful when the key and values
are streamed, hence `bulk_insert()` can't be used.
Method references is not only able to be marked up as code, also
reflects `--show-hash` option.
The bug that prevented the old rdoc from correctly parsing these
methods was fixed last month.
I used this regex:
([A-Za-z]+)\.html#(?:class|module)-[A-Za-z]+-label-([A-Za-z0-9\-\+]+)
And performed a global find & replace for this:
rdoc-ref:$1@$2
Before this change the write barrier was executed before the key and
value were actually reachable via the Hash. This could cause
inconsistencies in object coloration which would lead to accidental
collection of dup'd keys.
Example:
1. Object O is grey, Object P is white.
2. Write barrier fires O -> P
3. Write barrier does nothing
4. Malloc happens, which starts GC
5. GC colors O black
6. P is written in to O (now we have O -> P reference)
7. P is now accidentally treated as garbage
We found that we need to make Ruby objects while locking the environ
to ENV operation atomically, so we decided to use `RB_VM_LOCK_ENTER()`
instead of `env_lock`.
From the documentation of rb_obj_hash:
> Certain core classes such as Integer use built-in hash calculations and
> do not call the #hash method when used as a hash key.
So if you override, say, Integer#hash it won't be used from rb_hash_aref
and similar. This avoids method lookups in many common cases.
This commit uses the same optimization in rb_hash, a method used
internally and in the C API to get the hash value of an object. Usually
this is used to build the hash of an object based on its elements.
Previously it would always do a method lookup for 'hash'.
This is primarily intended to speed up hashing of Arrays and Hashes,
which call rb_hash for each element.
compare-ruby: ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]
built-ruby: ruby 3.1.0dev (2021-09-29T02:13:24Z fast_hash d670bf88b2) [x86_64-linux]
# Iteration per second (i/s)
| |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|hash_aref_array | 1.008| 1.769|
| | -| 1.76x|
* As the "doc/" prefix is specified by the `--page-dir` option,
remove from the rdoc references.
* Refer to the original .rdoc instead of the converted .html.
Instead of looking for Object::ENV (which can be overwritten),
directly look for the envtbl variable. As that is static in hash.c,
and the lookup code is in process.c, add a couple non-static
functions that will return envtbl (or envtbl#to_hash).
Fixes [Bug #18164]
We don't need to increment/decrement iteration level for frozen Hash
because frozen Hash can't be modified. We can assume that nobody
changes the target Hash while calling #each family.
How to reproduce:
a = {}
100.times do |i|
a[i] = true
end
Ractor.make_shareable(a)
4.times.collect do
Ractor.new(a) do |b|
100.times do
b.each_value do
end
end
end
end.each(&:take)
Example output:
internal:ractor>:267: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
#<Thread:0x00007fcfb087bb30 run> terminated with exception (report_on_exception is true):
#<Thread:0x00007fcfb087b8d8 run> terminated with exception (report_on_exception is true):
#<Thread:0x00007fcfb088d678 run> terminated with exception (report_on_exception is true):
#<Thread:0x00007fcfb087bd88 run> terminated with exception (report_on_exception is true):
/tmp/h.rb:10:in `each_value'/tmp/h.rb:10:in `each_value': : /tmp/h.rb:10:in `each_value'no implicit conversion from nil to integer/tmp/h.rb:10:in `each_value'no implicit conversion from nil to integer (: : (TypeErrorTypeError)no implicit conversion from nil to integer)no implicit conversion from nil to integer (
(TypeErrorTypeError from /tmp/h.rb:10:in `block (3 levels) in <main>'
from /tmp/h.rb:10:in `block (3 levels) in <main>'
)) from /tmp/h.rb:9:in `times'
from /tmp/h.rb:9:in `times'
from /tmp/h.rb:9:in `block (2 levels) in <main>'
from /tmp/h.rb:10:in `block (3 levels) in <main>'
from /tmp/h.rb:10:in `block (3 levels) in <main>'
from /tmp/h.rb:9:in `block (2 levels) in <main>'
from /tmp/h.rb:9:in `times'
from /tmp/h.rb:9:in `times'
from /tmp/h.rb:9:in `block (2 levels) in <main>'
from /tmp/h.rb:9:in `block (2 levels) in <main>'
<internal:ractor>:694:in `take': thrown by remote Ractor. (Ractor::RemoteError)
from /tmp/h.rb:14:in `each'
from /tmp/h.rb:14:in `<main>'
/tmp/h.rb:10:in `each_value': no implicit conversion from nil to integer (TypeError)
from /tmp/h.rb:10:in `block (3 levels) in <main>'
from /tmp/h.rb:9:in `times'
from /tmp/h.rb:9:in `block (2 levels) in <main>'
This makes the compare_by_identity setting always copied
for the following methods:
* except
* merge
* reject
* select
* slice
* transform_values
Some of these methods did not copy the setting, or only
copied the setting if the receiver was not empty.
Fixes [Bug #17757]
Co-authored-by: Kenichi Kamiya <kachick1@gmail.com>