Граф коммитов

1525 Коммитов

Автор SHA1 Сообщение Дата
Jeremy Evans 0f283054e7 Check that String#scrub block does not modify receiver
Similar to the check used for String#gsub.  Can fix possible
segfault.

Fixes [Bug #15941]
2019-07-02 08:34:01 -07:00
Jeremy Evans 7582287eb2 Make String#-@ not freeze receiver if called on unfrozen subclass instance
rb_fstring behavior in this case is to freeze the receiver.  I'm
not sure if that should be changed, so this takes the conservative
approach of duping the receiver in String#-@ before passing
to rb_fstring.

Fixes [Bug #15926]
2019-07-02 08:26:50 -07:00
git a88107c44d * expand tabs. 2019-06-29 10:17:37 +09:00
Nobuyoshi Nakada 2f6cc15cdb
Fixed String#grapheme_clusters with wide encodings
* string.c (get_reg_grapheme_cluster): make regexp from properly
  encoded sources fro wide-char encodings.  [Bug #15965]

* regparse.c (node_extended_grapheme_cluster): suppress false
  duplicated range warning for the time being.
2019-06-29 10:10:17 +09:00
John Hawthorn 04bc4c0662 Resize capacity for fstring
When a string is #frozen, it's capacity is resized to fit (if it is much
larger), since we know it will no longer be mutated.

    > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000))
    {"type":"STRING", "class":"0x7feaf00b7bf0", "bytesize":30, "capacity":1000, "value":"...
    > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000).freeze)
    {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "bytesize":30, "value":"...

(ObjectSpace.dump doesn't show capacity if capacity is equal to bytesize)

Previously, if we dedup into an fstring, using String#-@, capacity would
not be reduced.

    > puts ObjectSpace.dump(-String.new("a"*30, capacity: 1000))
    {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "fstring":true, "bytesize":30, "capacity":1000, "value":"...

This commit makes rb_fstring call rb_str_resize, the same as
rb_str_freeze does.

Closes: https://github.com/ruby/ruby/pull/2256
2019-06-26 15:01:48 +09:00
git 551ef27490 * expand tabs. 2019-06-21 22:48:50 +09:00
Nobuyoshi Nakada 8f51da5d41
Get rid of undefined behavior
* string.c (rb_str_sub_bang): str and repl can be same.
  [Bug #15946]
2019-06-21 22:42:35 +09:00
Nobuyoshi Nakada 8797f48373
New buffer for shared string
* string.c (rb_str_init): allocate new buffer if the string is
  shared.  [Bug #15937]
2019-06-19 14:39:19 +09:00
Nobuyoshi Nakada 28678997e4
Preserve the string content at self-copying
* string.c (rb_str_init): preserve the embedded content when
  self-copying with a capacity.  [Bug #15937]
2019-06-19 09:44:26 +09:00
Nobuyoshi Nakada 8b3774be3d
Fix memory leak
* string.c (str_make_independent_expand): free independent buffer.
  [Bug# 15935]

Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
2019-06-18 13:40:04 +09:00
git c770c98ac4 * expand tabs. 2019-06-18 12:21:38 +09:00
Alan Wu 9dec4e8fc3
String#b: Don't depend on dependent string
Registering a string that depend on a dependent string as fstring
can lead to use-after-free. See c06ddfe and 3f95620 for details.

The following script triggers use-after-free on trunk, 2.4.6, 2.5.5
and 2.6.3. Credits to @wanabe for using eval as a cross-version way
of registering a fstring.

```ruby
a = ('j' * 24).b.b
eval('', binding, a)

p a
4.times { GC.start }
p a
```

 - string.c (str_replace_shared_without_enc): when given a
   dependent string, depend on the root of the dependent
   string.

[Bug #15934]
2019-06-18 12:18:13 +09:00
Nobuyoshi Nakada 53e9908d8a
Fix memory leak
* string.c (str_replace_shared_without_enc): free previous buffer
  before replaced.

* parse.y (gettable): make sure in advance that the `__FILE__`
  object shares a fstring, to get rid of replacement with the
  fstring later.
  TODO: this hack may be needed in other places.

[Bug #15916]

Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
2019-06-16 23:51:22 +09:00
Nobuyoshi Nakada d2003a6d39
Symbol just represents a name 2019-05-14 00:30:08 +09:00
Alan Wu c06ddfee87
str_duplicate: Don't share with a frozen shared string
This is a follow up for 3f9562015e.
Before this commit, it was possible to create a shared string which
shares with another shared string by passing a frozen shared string
to `str_duplicate`.

Such string looks like:

```
 --------                    -----------------
 | root | ------ owns -----> | root's buffer |
 --------                    -----------------
     ^                             ^   ^
 -----------                       |   |
 | shared1 | ------ references -----   |
 -----------                           |
     ^                                 |
 -----------                           |
 | shared2 | ------ references ---------
 -----------
```

This is bad news because `rb_fstring(shared2)` can make `shared1`
independent, which severs the reference from `shared1` to `root`:

```c
/* from fstr_update_callback() */
str = str_new_frozen(rb_cString, shared2);  /* can return shared1 */
if (STR_SHARED_P(str)) { /* shared1 is also a shared string */
    str_make_independent(str);  /* no frozen check */
}
```

If `shared1` was the only reference to `root`, then `root` can be
reclaimed by the GC, leaving `shared2` in a corrupted state:

```
 -----------                         --------------------
 | shared1 | -------- owns --------> | shared1's buffer |
 -----------                         --------------------
      ^
      |
 -----------                         -------------------------
 | shared2 | ------ references ----> | root's buffer (freed) |
 -----------                         -------------------------
```

Here is a reproduction script for the situation this commit fixes.

```ruby
a = ('a' * 24).strip.freeze.strip
-a
p a
4.times { GC.start }
p a
```

 - string.c (str_duplicate): always share with the root string when
   the original is a shared string.
 - test_rb_str_dup.rb: specifically test `rb_str_dup` to make
   sure it does not try to share with a shared string.

[Bug #15792]

Closes: https://github.com/ruby/ruby/pull/2159
2019-05-09 10:04:19 +09:00
Nobuyoshi Nakada f1b0db2c70
Revert "UTF-8 is one of byte based encodings"
This reverts commit 5776ae3475.

Mistaken `max` as `min`.
2019-05-06 11:02:12 +09:00
Marcus Stollsteimer 35ff4ed47f Improve documentation for String#{dump,undump} 2019-05-05 09:51:40 +02:00
git 04fd98d596 * expand tabs. 2019-05-03 23:59:58 +09:00
Nobuyoshi Nakada 77440e949b
Improve performance of case-conversion methods 2019-05-03 23:59:18 +09:00
Nobuyoshi Nakada 5776ae3475
UTF-8 is one of byte based encodings 2019-05-03 15:33:59 +09:00
git 5c87bb3b90 * expand tabs. 2019-05-02 22:44:43 +09:00
Nobuyoshi Nakada 5e23b1138f
Fix potential memory leak 2019-05-02 22:44:20 +09:00
Urabe, Shyouhei f4c68640d6 this variable is not guaranteed aligned
No problem for unaligned-ness because we never dereference.
2019-04-29 21:52:44 +09:00
Urabe, Shyouhei 7c0f513e97 fix typo 2019-04-29 21:52:44 +09:00
Nobuyoshi Nakada 3f9562015e
Get rid of indirect sharing
* string.c (str_duplicate): share the root shared string if the
  original string is already sharing, so that all shared strings
  refer the root shared string directly.  indirect sharing can
  cause a dangling pointer.

[Bug #15792]
2019-04-27 21:26:42 +09:00
nobu 4d1f86a1ff string.c: warn non-nil $;
* string.c (rb_str_split_m): warn use of non-nil $;.

* string.c (rb_fs_setter): warn when set to non-nil value.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67603 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-18 09:34:40 +00:00
nobu e1eb54b99d string.c: improve splitting into chars
* string.c (rb_str_split_m): improve splitting into chars by an
  empty string, without a regexp.

    Comparison:
                           to_chars-1
              built-ruby:   1273527.6 i/s
            compare-ruby:    189423.3 i/s - 6.72x  slower

                          to_chars-10
              built-ruby:    120993.5 i/s
            compare-ruby:     37075.8 i/s - 3.26x  slower

                         to_chars-100
              built-ruby:     15646.4 i/s
            compare-ruby:      4012.1 i/s - 3.90x  slower

                        to_chars-1000
              built-ruby:      1295.1 i/s
            compare-ruby:       408.5 i/s - 3.17x  slower

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67582 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-17 05:34:46 +00:00
nobu 46968fab0a string.c: [DOC] fix reference to sprintf [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67312 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-20 01:35:27 +00:00
nobu 8b49e5b47d string.c: [DOC] remove unnecessary markups [ci skip]
* string.c: remove <code> markups, which are not only unnecessary
  but also prevented cross-references.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67311 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-20 01:31:44 +00:00
nobu a265141c84 string.c: [DOC] fix indent [ci skip]
* string.c (rb_str_crypt): fix indent not to make the whole list
  verbatim entirely.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67310 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-20 01:17:16 +00:00
nobu 593505ac6f string.c: respect the actual encoding
* string.c (rb_enc_str_coderange): respect the actual encoding of
  if a BOM presents, and scan for the actual code range.
  [ruby-core:91662] [Bug #15635]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67167 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-05 00:32:15 +00:00
nobu fb84b86be0 * string.c (chopped_length): early return for empty strings
[Bug #11391]

From: Franck Verrot <franck@verrot.fr>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67018 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-02-07 07:39:47 +00:00
kazu bdbc8a8f12 Add more example of `String#dump`
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66906 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-22 12:43:57 +00:00
samuel 502f159421 Improvements to documentation.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66897 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-21 10:56:29 +00:00
mame 6891a1cd5d string.c (rb_str_dump): Fix the rdoc
* Officially states that String#dump is intended for round-trip.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66894 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-21 08:51:51 +00:00
nobu d7976d1451 Use `&` instead of `modulo`
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66830 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-15 12:05:46 +00:00
shyouhei d154bec0d5 setbyte / ungetbyte allow out-of-range integers
* string.c: String#setbyte to accept arbitrary integers [Bug #15460]

* io.c: ditto for IO#ungetbyte

* ext/strringio/stringio.c: ditto for StringIO#ungetbyte



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66824 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-15 06:41:58 +00:00
nobu 50784a0a44 Defer escaping control char in error messages
* eval_error.c (print_errinfo): defer escaping control char in
  error messages until writing to stderr, instead of quoting at
  building the message.  [ruby-core:90853] [Bug #15497]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66753 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-08 09:08:31 +00:00
mame 94bdc4edf0 string.c: remove the deprecation warnings of `String#bytes` with block
And its friends: lines, chars, grapheme_clusters, and codepoints.
[Feature #6670] [ruby-core:90728]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66579 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-26 14:43:25 +00:00
mame 0df1de8b32 Revert "string.c: remove the deprecation warnings of `String#bytes` with block"
Forgot to write the ticket number in the commit log...

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66578 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-26 14:42:07 +00:00
mame 2b21744efa string.c: remove the deprecation warnings of `String#bytes` with block
And its friends: lines, chars, grapheme_clusters, and codepoints.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-26 08:52:19 +00:00
stomar 31dc65b275 string.c: [DOC] fix typos
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66375 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-12 22:04:48 +00:00
duerst 3628eae2e7 implement special behavior for Georgian for String#capitalize
The modern Georgian script is special in that it has an 'uppercase'
variant called MTAVRULI which can be used for emphasis of whole words,
for screamy headlines, and so on. However, in contrast to all other
bicameral scripts, there is no usage of capitalizing the first letter
in a word or a sentence. Words with mixed capitalization are not used
at all.

We therefore implement special behavior for String#capitalize. Formally,
we define String#capitalize as first applying String#downcase for the
whole string, then using titlecase on the first letter. Because Georgian
defines titlecase as the identity function both for MTAVRULI ('uppercase')
and Mkhedruli (lowercase), this results in String#capitalize being
equivalent to String#downcase for Georgian. This avoids undesirable
mixed case.

* enc/unicode.c: Actual implementation

* string.c: Add mention of this special case for documentation

* test/ruby/enc/test_case_mapping.rb: Add two tests, a general one
  that uses String#capitalize on some (including nonsensical)
  combinations of MTAVRULI and Mkhedruli, and a canary test to
  detect the potential assignment of characters to the currently
  open slots (holes) at U+1CBB and U+1CBC.

* test/ruby/enc/test_case_comprehensive.rb: Tweak generation of
  expectation data.

Together with r65933, this closes issue #14839.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-09 23:14:29 +00:00
naruse e39a83a150 suppress warning: unused variable 'vbits'
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66245 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06 10:42:35 +00:00
nobu 98e65d9d92 Prefer rb_check_arity when 0 or 1 arguments
Especially over checking argc then calling rb_scan_args just to
raise an ArgumentError.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66238 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06 07:49:24 +00:00
shyouhei 37c22bd945 string.c: [DOC] deprecate String#crypt [ci skip] [Feature #14915]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66154 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-03 05:46:46 +00:00
svn 64148e66b6 * expand tabs.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65957 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24 12:26:11 +00:00
naruse 1f9731654e fix r65954; Keep tainty
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65956 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24 12:26:07 +00:00
naruse 7850586af4 Don't use single byte optimization on grapheme clusters
Unicode Text Segmentation considers CRLF as a character. [Bug #15337]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65954 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24 11:53:19 +00:00
shyouhei 953091a4b1 char is not unsigned
It seems that decades ago, ruby was written under assumption that
char is unsigned.  Which is of course a false assumption.  We
need to explicitly store a numeric value into an unsigned char
variable to tell we expect 0..255 value.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65900 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-21 08:51:39 +00:00