Embedded shared strings cannot be moved because strings point into the
slot of the shared string. There may be code using the RSTRING_PTR on
the stack, which would pin the string but not pin the shared string,
causing it to move.
`String#+@` is 2-3 times faster than `String#dup` because it can
directly go through `rb_str_dup` instead of using the generic
much slower `rb_obj_dup`.
This fact led to the existance of the ugly `Performance/UnfreezeString`
rubocop performance rule that encourage users to rewrite the much
more readable and convenient `"foo".dup` into the ugly `(+"foo")`.
Let's make that rubocop rule useless.
```
compare-ruby: ruby 3.3.0dev (2023-11-20T02:02:55Z master 701b0650de) [arm64-darwin22]
last_commit=[ruby/prism] feat: add encoding for IBM865 (https://github.com/ruby/prism/pull/1884)
built-ruby: ruby 3.3.0dev (2023-11-20T12:51:45Z faster-str-lit-dup 6b745bbc5d) [arm64-darwin22]
warming up..
| |compare-ruby|built-ruby|
|:------|-----------:|---------:|
|uplus | 16.312M| 16.332M|
| | -| 1.00x|
|dup | 5.912M| 16.329M|
| | -| 2.76x|
```
Some code out there blind calls `force_encoding` without checking
what the original encoding was, which clears the coderange uselessly.
If the String is big, it can be a rather costly mistake.
For instance the `rack-utf8_sanitizer` gem does this on request
bodies.
Previously we used the next character following the found prefix to
determine if the match ended on a broken character.
This had caused surprising behaviour when a valid character was followed
by a UTF-8 continuation byte.
This commit changes the behaviour to instead look for the end of the
last character in the prefix.
[Bug #19784]
Co-authored-by: ywenc <ywenc@github.com>
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Previously, the following crashed due to use-after-free
with AArch64 Alpine Linux 3.18.3 (aarch64-linux-musl):
```ruby
str = 'a' * (32*1024*1024)
p({z: str})
```
32 MiB is the default for `GC_MALLOC_LIMIT_MAX`, and the crash
could be dodged by setting `RUBY_GC_MALLOC_LIMIT_MAX` to large values.
Under a debugger, one can see the `str2` of rb_str_buf_append()
getting prematurely collected while str_buf_cat4() allocates capacity.
Add GC guards so the buffer of `str2` lives across the GC run
initiated in str_buf_cat4().
[Bug #19792]
Leave callers to convert byte index to char index, as well as
`rb_str_index`, so that `rb_str_rpartition` does not need to
re-convert char index to byte index.
When String#split is used with an empty string as the field seperator it
effectively splits the original string into chars, and there is a
pre-existing fast path for this using SPLIT_TYPE_CHARS.
However this path creates an empty array in the smallest size pool and
grows from there, despite already knowing the size of the desired array.
This commit pre-allocates the correct size array in this case in order
to allow the arrays to be embedded and avoid being allocated in the
transient heap
* Unify length field for embedded and heap strings
The length field is of the same type and position in RString for both
embedded and heap allocated strings, so we can unify it.
* Remove RSTRING_EMBED_LEN
Remove !USE_RVARGC code
[Feature #19579]
The Variable Width Allocation feature was turned on by default in Ruby
3.2. Since then, we haven't received bug reports or backports to the
non-Variable Width Allocation code paths, so we assume that nobody is
using it. We also don't plan on maintaining the non-Variable Width
Allocation code, so we are going to remove it.
bytesplice(index, length, str, str_index, str_length) -> string
bytesplice(range, str, str_range) -> string
In these forms, the content of +self+ is replaced by str.byteslice(str_index, str_length) or str.byteslice(str_range); however the substring of +str+ is not allocated as a new string.
In Feature #19314, we concluded that the return value of String#bytesplice
should be changed from the source string to the receiver, because the source
string is useless and confusing when extra arguments are added.
This change should be included in Ruby 3.2.1.
str_enc_copy_direct copies the string encoding over without checking the
frozen status of the string. Because we know that we're safe here (we
only use this function when interpolating strings on the stack via a
concatstrings instruction) we can safely skip this check