rb_fstring behavior in this case is to freeze the receiver. I'm
not sure if that should be changed, so this takes the conservative
approach of duping the receiver in String#-@ before passing
to rb_fstring.
Fixes [Bug #15926]
* string.c (get_reg_grapheme_cluster): make regexp from properly
encoded sources fro wide-char encodings. [Bug #15965]
* regparse.c (node_extended_grapheme_cluster): suppress false
duplicated range warning for the time being.
When a string is #frozen, it's capacity is resized to fit (if it is much
larger), since we know it will no longer be mutated.
> puts ObjectSpace.dump(String.new("a"*30, capacity: 1000))
{"type":"STRING", "class":"0x7feaf00b7bf0", "bytesize":30, "capacity":1000, "value":"...
> puts ObjectSpace.dump(String.new("a"*30, capacity: 1000).freeze)
{"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "bytesize":30, "value":"...
(ObjectSpace.dump doesn't show capacity if capacity is equal to bytesize)
Previously, if we dedup into an fstring, using String#-@, capacity would
not be reduced.
> puts ObjectSpace.dump(-String.new("a"*30, capacity: 1000))
{"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "fstring":true, "bytesize":30, "capacity":1000, "value":"...
This commit makes rb_fstring call rb_str_resize, the same as
rb_str_freeze does.
Closes: https://github.com/ruby/ruby/pull/2256
Registering a string that depend on a dependent string as fstring
can lead to use-after-free. See c06ddfe and 3f95620 for details.
The following script triggers use-after-free on trunk, 2.4.6, 2.5.5
and 2.6.3. Credits to @wanabe for using eval as a cross-version way
of registering a fstring.
```ruby
a = ('j' * 24).b.b
eval('', binding, a)
p a
4.times { GC.start }
p a
```
- string.c (str_replace_shared_without_enc): when given a
dependent string, depend on the root of the dependent
string.
[Bug #15934]
* string.c (str_replace_shared_without_enc): free previous buffer
before replaced.
* parse.y (gettable): make sure in advance that the `__FILE__`
object shares a fstring, to get rid of replacement with the
fstring later.
TODO: this hack may be needed in other places.
[Bug #15916]
Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
This is a follow up for 3f9562015e.
Before this commit, it was possible to create a shared string which
shares with another shared string by passing a frozen shared string
to `str_duplicate`.
Such string looks like:
```
-------- -----------------
| root | ------ owns -----> | root's buffer |
-------- -----------------
^ ^ ^
----------- | |
| shared1 | ------ references ----- |
----------- |
^ |
----------- |
| shared2 | ------ references ---------
-----------
```
This is bad news because `rb_fstring(shared2)` can make `shared1`
independent, which severs the reference from `shared1` to `root`:
```c
/* from fstr_update_callback() */
str = str_new_frozen(rb_cString, shared2); /* can return shared1 */
if (STR_SHARED_P(str)) { /* shared1 is also a shared string */
str_make_independent(str); /* no frozen check */
}
```
If `shared1` was the only reference to `root`, then `root` can be
reclaimed by the GC, leaving `shared2` in a corrupted state:
```
----------- --------------------
| shared1 | -------- owns --------> | shared1's buffer |
----------- --------------------
^
|
----------- -------------------------
| shared2 | ------ references ----> | root's buffer (freed) |
----------- -------------------------
```
Here is a reproduction script for the situation this commit fixes.
```ruby
a = ('a' * 24).strip.freeze.strip
-a
p a
4.times { GC.start }
p a
```
- string.c (str_duplicate): always share with the root string when
the original is a shared string.
- test_rb_str_dup.rb: specifically test `rb_str_dup` to make
sure it does not try to share with a shared string.
[Bug #15792]
Closes: https://github.com/ruby/ruby/pull/2159
* string.c (str_duplicate): share the root shared string if the
original string is already sharing, so that all shared strings
refer the root shared string directly. indirect sharing can
cause a dangling pointer.
[Bug #15792]
* string.c (rb_str_split_m): warn use of non-nil $;.
* string.c (rb_fs_setter): warn when set to non-nil value.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67603 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: remove <code> markups, which are not only unnecessary
but also prevented cross-references.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67311 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_crypt): fix indent not to make the whole list
verbatim entirely.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67310 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_enc_str_coderange): respect the actual encoding of
if a BOM presents, and scan for the actual code range.
[ruby-core:91662] [Bug #15635]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67167 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Officially states that String#dump is intended for round-trip.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66894 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* eval_error.c (print_errinfo): defer escaping control char in
error messages until writing to stderr, instead of quoting at
building the message. [ruby-core:90853] [Bug #15497]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66753 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
And its friends: lines, chars, grapheme_clusters, and codepoints.
[Feature #6670] [ruby-core:90728]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66579 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
And its friends: lines, chars, grapheme_clusters, and codepoints.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The modern Georgian script is special in that it has an 'uppercase'
variant called MTAVRULI which can be used for emphasis of whole words,
for screamy headlines, and so on. However, in contrast to all other
bicameral scripts, there is no usage of capitalizing the first letter
in a word or a sentence. Words with mixed capitalization are not used
at all.
We therefore implement special behavior for String#capitalize. Formally,
we define String#capitalize as first applying String#downcase for the
whole string, then using titlecase on the first letter. Because Georgian
defines titlecase as the identity function both for MTAVRULI ('uppercase')
and Mkhedruli (lowercase), this results in String#capitalize being
equivalent to String#downcase for Georgian. This avoids undesirable
mixed case.
* enc/unicode.c: Actual implementation
* string.c: Add mention of this special case for documentation
* test/ruby/enc/test_case_mapping.rb: Add two tests, a general one
that uses String#capitalize on some (including nonsensical)
combinations of MTAVRULI and Mkhedruli, and a canary test to
detect the potential assignment of characters to the currently
open slots (holes) at U+1CBB and U+1CBC.
* test/ruby/enc/test_case_comprehensive.rb: Tweak generation of
expectation data.
Together with r65933, this closes issue #14839.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Especially over checking argc then calling rb_scan_args just to
raise an ArgumentError.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66238 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Unicode Text Segmentation considers CRLF as a character. [Bug #15337]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65954 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
It seems that decades ago, ruby was written under assumption that
char is unsigned. Which is of course a false assumption. We
need to explicitly store a numeric value into an unsigned char
variable to tell we expect 0..255 value.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65900 b2dd03c8-39d4-4d8f-98ff-823fe69b080e