Граф коммитов

1637 Коммитов

Автор SHA1 Сообщение Дата
git 132b7eb104 * expand tabs. [ci skip] 2019-08-15 08:16:14 +09:00
Jeremy Evans 082424ef58 Fold to lowercase intead of uppercase for String#casecmp
strcasecmp(3) and String#casecmp? both fold to lowercase.
2019-08-14 14:11:39 -07:00
Aaron Patterson 957bdfbab8
Update docs to use more natural English
Just a few updates to make the English sound a bit more natural
2019-08-12 12:21:37 -04:00
Yusuke Endoh 8d302c914c string.c (rb_str_sub, _gsub): improve the rdoc
This change:

* Added an explanation about back references except \n and \k<n>
  (\` \& \' \+ \0)
* Added an explanation about an escape (\\)
* Added some rdoc references
* Rephrased and clarified the reason why double escape is needed, added
  some examples, and moved the note to the last (because it is not
  specific to the method itself).
2019-08-12 23:28:35 +09:00
卜部昌平 b5146e375a
leafify opt_plus
Inspired by 346aa557b3

Closes: https://github.com/ruby/ruby/pull/2321
2019-08-06 20:59:19 +09:00
Takashi Kokubun 346aa557b3
Make opt_eq and opt_neq insns leaf
# Benchmark zero?

```
require 'benchmark/ips'

Numeric.class_eval do
  def ruby_zero?
    self == 0
  end
end

Benchmark.ips do |x|
  x.report('0.zero?') { 0.ruby_zero? }
  x.report('1.zero?') { 1.ruby_zero? }
  x.compare!
end
```

## VM
No significant impact for VM.

### before
ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) [x86_64-linux]

  0.zero?: 21855445.5 i/s
  1.zero?: 21770817.3 i/s - same-ish: difference falls within error

### after
ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) [x86_64-linux]

  1.zero?: 21958912.3 i/s
  0.zero?: 21881625.9 i/s - same-ish: difference falls within error

## JIT
The performance improves about 1.23x.

### before
ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) +JIT [x86_64-linux]

  0.zero?: 36343111.6 i/s
  1.zero?: 36295153.3 i/s - same-ish: difference falls within error

### after
ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) +JIT [x86_64-linux]

  0.zero?: 44740467.2 i/s
  1.zero?: 44363616.1 i/s - same-ish: difference falls within error

# Benchmark str == str / str != str

```
# frozen_string_literal: true
require 'benchmark/ips'

Benchmark.ips do |x|
  x.report('a == a') { 'a' == 'a' }
  x.report('a == b') { 'a' == 'b' }
  x.report('a != a') { 'a' != 'a' }
  x.report('a != b') { 'a' != 'b' }
  x.compare!
end
```

## VM
No significant impact for VM.

### before
ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) [x86_64-linux]

  a == a: 27286219.0 i/s
  a != a: 24892389.5 i/s - 1.10x  slower
  a == b: 23623635.8 i/s - 1.16x  slower
  a != b: 21800958.0 i/s - 1.25x  slower

### after
ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) [x86_64-linux]

  a == a: 27224016.2 i/s
  a != a: 24490109.5 i/s - 1.11x  slower
  a == b: 23391052.4 i/s - 1.16x  slower
  a != b: 21811321.7 i/s - 1.25x  slower

## JIT
The performance improves on JIT a little.

### before
ruby 2.7.0dev (2019-08-04T02:56:02Z master 2d8c037e97) +JIT [x86_64-linux]

  a == a: 42010674.7 i/s
  a != a: 38920311.2 i/s - same-ish: difference falls within error
  a == b: 32574262.2 i/s - 1.29x  slower
  a != b: 32099790.3 i/s - 1.31x  slower

### after
ruby 2.7.0dev (2019-08-04T11:17:10Z opt-eq-leaf 6404bebd6a) +JIT [x86_64-linux]

  a == a: 46902738.8 i/s
  a != a: 43097258.6 i/s - 1.09x  slower
  a == b: 35822018.4 i/s - 1.31x  slower
  a != b: 33377257.8 i/s - 1.41x  slower

This is needed towards Bug#15589.
Closes: https://github.com/ruby/ruby/pull/2318
2019-08-04 22:20:12 +09:00
Nobuyoshi Nakada 1d1f98d49c
Reuse match data
* string.c (rb_str_split_m): reuse occupied match data.  [Bug #16024]
2019-07-28 07:33:21 +09:00
Nobuyoshi Nakada f1b76ea63c
Occupy match data
* string.c (rb_str_split_m): occupy match data not to be modified
  during yielding the block.  [Bug #16024]
2019-07-27 21:54:34 +09:00
Yusuke Endoh 43c337dfc1 string.c (str_succ): refactoring
Use more communicative variable name
2019-07-14 23:09:24 +09:00
Yusuke Endoh 3fd086ed56 string.c (str_succ): remove a unnecessary assignment
This change will suppress Coverity Scan warnings
2019-07-14 23:09:24 +09:00
git 9987296b8b * expand tabs. 2019-07-14 17:16:35 +09:00
Yusuke Endoh 934e6b2aeb Prefer `rb_error_arity` to `rb_check_arity` when it can be used 2019-07-14 17:16:19 +09:00
Jeremy Evans 0f283054e7 Check that String#scrub block does not modify receiver
Similar to the check used for String#gsub.  Can fix possible
segfault.

Fixes [Bug #15941]
2019-07-02 08:34:01 -07:00
Jeremy Evans 7582287eb2 Make String#-@ not freeze receiver if called on unfrozen subclass instance
rb_fstring behavior in this case is to freeze the receiver.  I'm
not sure if that should be changed, so this takes the conservative
approach of duping the receiver in String#-@ before passing
to rb_fstring.

Fixes [Bug #15926]
2019-07-02 08:26:50 -07:00
git a88107c44d * expand tabs. 2019-06-29 10:17:37 +09:00
Nobuyoshi Nakada 2f6cc15cdb
Fixed String#grapheme_clusters with wide encodings
* string.c (get_reg_grapheme_cluster): make regexp from properly
  encoded sources fro wide-char encodings.  [Bug #15965]

* regparse.c (node_extended_grapheme_cluster): suppress false
  duplicated range warning for the time being.
2019-06-29 10:10:17 +09:00
John Hawthorn 04bc4c0662 Resize capacity for fstring
When a string is #frozen, it's capacity is resized to fit (if it is much
larger), since we know it will no longer be mutated.

    > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000))
    {"type":"STRING", "class":"0x7feaf00b7bf0", "bytesize":30, "capacity":1000, "value":"...
    > puts ObjectSpace.dump(String.new("a"*30, capacity: 1000).freeze)
    {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "bytesize":30, "value":"...

(ObjectSpace.dump doesn't show capacity if capacity is equal to bytesize)

Previously, if we dedup into an fstring, using String#-@, capacity would
not be reduced.

    > puts ObjectSpace.dump(-String.new("a"*30, capacity: 1000))
    {"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "fstring":true, "bytesize":30, "capacity":1000, "value":"...

This commit makes rb_fstring call rb_str_resize, the same as
rb_str_freeze does.

Closes: https://github.com/ruby/ruby/pull/2256
2019-06-26 15:01:48 +09:00
git 551ef27490 * expand tabs. 2019-06-21 22:48:50 +09:00
Nobuyoshi Nakada 8f51da5d41
Get rid of undefined behavior
* string.c (rb_str_sub_bang): str and repl can be same.
  [Bug #15946]
2019-06-21 22:42:35 +09:00
Nobuyoshi Nakada 8797f48373
New buffer for shared string
* string.c (rb_str_init): allocate new buffer if the string is
  shared.  [Bug #15937]
2019-06-19 14:39:19 +09:00
Nobuyoshi Nakada 28678997e4
Preserve the string content at self-copying
* string.c (rb_str_init): preserve the embedded content when
  self-copying with a capacity.  [Bug #15937]
2019-06-19 09:44:26 +09:00
Nobuyoshi Nakada 8b3774be3d
Fix memory leak
* string.c (str_make_independent_expand): free independent buffer.
  [Bug# 15935]

Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
2019-06-18 13:40:04 +09:00
git c770c98ac4 * expand tabs. 2019-06-18 12:21:38 +09:00
Alan Wu 9dec4e8fc3
String#b: Don't depend on dependent string
Registering a string that depend on a dependent string as fstring
can lead to use-after-free. See c06ddfe and 3f95620 for details.

The following script triggers use-after-free on trunk, 2.4.6, 2.5.5
and 2.6.3. Credits to @wanabe for using eval as a cross-version way
of registering a fstring.

```ruby
a = ('j' * 24).b.b
eval('', binding, a)

p a
4.times { GC.start }
p a
```

 - string.c (str_replace_shared_without_enc): when given a
   dependent string, depend on the root of the dependent
   string.

[Bug #15934]
2019-06-18 12:18:13 +09:00
Nobuyoshi Nakada 53e9908d8a
Fix memory leak
* string.c (str_replace_shared_without_enc): free previous buffer
  before replaced.

* parse.y (gettable): make sure in advance that the `__FILE__`
  object shares a fstring, to get rid of replacement with the
  fstring later.
  TODO: this hack may be needed in other places.

[Bug #15916]

Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
2019-06-16 23:51:22 +09:00
Nobuyoshi Nakada d2003a6d39
Symbol just represents a name 2019-05-14 00:30:08 +09:00
Alan Wu c06ddfee87
str_duplicate: Don't share with a frozen shared string
This is a follow up for 3f9562015e.
Before this commit, it was possible to create a shared string which
shares with another shared string by passing a frozen shared string
to `str_duplicate`.

Such string looks like:

```
 --------                    -----------------
 | root | ------ owns -----> | root's buffer |
 --------                    -----------------
     ^                             ^   ^
 -----------                       |   |
 | shared1 | ------ references -----   |
 -----------                           |
     ^                                 |
 -----------                           |
 | shared2 | ------ references ---------
 -----------
```

This is bad news because `rb_fstring(shared2)` can make `shared1`
independent, which severs the reference from `shared1` to `root`:

```c
/* from fstr_update_callback() */
str = str_new_frozen(rb_cString, shared2);  /* can return shared1 */
if (STR_SHARED_P(str)) { /* shared1 is also a shared string */
    str_make_independent(str);  /* no frozen check */
}
```

If `shared1` was the only reference to `root`, then `root` can be
reclaimed by the GC, leaving `shared2` in a corrupted state:

```
 -----------                         --------------------
 | shared1 | -------- owns --------> | shared1's buffer |
 -----------                         --------------------
      ^
      |
 -----------                         -------------------------
 | shared2 | ------ references ----> | root's buffer (freed) |
 -----------                         -------------------------
```

Here is a reproduction script for the situation this commit fixes.

```ruby
a = ('a' * 24).strip.freeze.strip
-a
p a
4.times { GC.start }
p a
```

 - string.c (str_duplicate): always share with the root string when
   the original is a shared string.
 - test_rb_str_dup.rb: specifically test `rb_str_dup` to make
   sure it does not try to share with a shared string.

[Bug #15792]

Closes: https://github.com/ruby/ruby/pull/2159
2019-05-09 10:04:19 +09:00
Nobuyoshi Nakada f1b0db2c70
Revert "UTF-8 is one of byte based encodings"
This reverts commit 5776ae3475.

Mistaken `max` as `min`.
2019-05-06 11:02:12 +09:00
Marcus Stollsteimer 35ff4ed47f Improve documentation for String#{dump,undump} 2019-05-05 09:51:40 +02:00
git 04fd98d596 * expand tabs. 2019-05-03 23:59:58 +09:00
Nobuyoshi Nakada 77440e949b
Improve performance of case-conversion methods 2019-05-03 23:59:18 +09:00
Nobuyoshi Nakada 5776ae3475
UTF-8 is one of byte based encodings 2019-05-03 15:33:59 +09:00
git 5c87bb3b90 * expand tabs. 2019-05-02 22:44:43 +09:00
Nobuyoshi Nakada 5e23b1138f
Fix potential memory leak 2019-05-02 22:44:20 +09:00
Urabe, Shyouhei f4c68640d6 this variable is not guaranteed aligned
No problem for unaligned-ness because we never dereference.
2019-04-29 21:52:44 +09:00
Urabe, Shyouhei 7c0f513e97 fix typo 2019-04-29 21:52:44 +09:00
Nobuyoshi Nakada 3f9562015e
Get rid of indirect sharing
* string.c (str_duplicate): share the root shared string if the
  original string is already sharing, so that all shared strings
  refer the root shared string directly.  indirect sharing can
  cause a dangling pointer.

[Bug #15792]
2019-04-27 21:26:42 +09:00
nobu 4d1f86a1ff string.c: warn non-nil $;
* string.c (rb_str_split_m): warn use of non-nil $;.

* string.c (rb_fs_setter): warn when set to non-nil value.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67603 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-18 09:34:40 +00:00
nobu e1eb54b99d string.c: improve splitting into chars
* string.c (rb_str_split_m): improve splitting into chars by an
  empty string, without a regexp.

    Comparison:
                           to_chars-1
              built-ruby:   1273527.6 i/s
            compare-ruby:    189423.3 i/s - 6.72x  slower

                          to_chars-10
              built-ruby:    120993.5 i/s
            compare-ruby:     37075.8 i/s - 3.26x  slower

                         to_chars-100
              built-ruby:     15646.4 i/s
            compare-ruby:      4012.1 i/s - 3.90x  slower

                        to_chars-1000
              built-ruby:      1295.1 i/s
            compare-ruby:       408.5 i/s - 3.17x  slower

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67582 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-17 05:34:46 +00:00
nobu 46968fab0a string.c: [DOC] fix reference to sprintf [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67312 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-20 01:35:27 +00:00
nobu 8b49e5b47d string.c: [DOC] remove unnecessary markups [ci skip]
* string.c: remove <code> markups, which are not only unnecessary
  but also prevented cross-references.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67311 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-20 01:31:44 +00:00
nobu a265141c84 string.c: [DOC] fix indent [ci skip]
* string.c (rb_str_crypt): fix indent not to make the whole list
  verbatim entirely.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67310 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-20 01:17:16 +00:00
nobu 593505ac6f string.c: respect the actual encoding
* string.c (rb_enc_str_coderange): respect the actual encoding of
  if a BOM presents, and scan for the actual code range.
  [ruby-core:91662] [Bug #15635]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67167 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-05 00:32:15 +00:00
nobu fb84b86be0 * string.c (chopped_length): early return for empty strings
[Bug #11391]

From: Franck Verrot <franck@verrot.fr>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67018 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-02-07 07:39:47 +00:00
kazu bdbc8a8f12 Add more example of `String#dump`
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66906 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-22 12:43:57 +00:00
samuel 502f159421 Improvements to documentation.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66897 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-21 10:56:29 +00:00
mame 6891a1cd5d string.c (rb_str_dump): Fix the rdoc
* Officially states that String#dump is intended for round-trip.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66894 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-21 08:51:51 +00:00
nobu d7976d1451 Use `&` instead of `modulo`
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66830 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-15 12:05:46 +00:00
shyouhei d154bec0d5 setbyte / ungetbyte allow out-of-range integers
* string.c: String#setbyte to accept arbitrary integers [Bug #15460]

* io.c: ditto for IO#ungetbyte

* ext/strringio/stringio.c: ditto for StringIO#ungetbyte



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66824 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-15 06:41:58 +00:00
nobu 50784a0a44 Defer escaping control char in error messages
* eval_error.c (print_errinfo): defer escaping control char in
  error messages until writing to stderr, instead of quoting at
  building the message.  [ruby-core:90853] [Bug #15497]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66753 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-08 09:08:31 +00:00
mame 94bdc4edf0 string.c: remove the deprecation warnings of `String#bytes` with block
And its friends: lines, chars, grapheme_clusters, and codepoints.
[Feature #6670] [ruby-core:90728]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66579 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-26 14:43:25 +00:00
mame 0df1de8b32 Revert "string.c: remove the deprecation warnings of `String#bytes` with block"
Forgot to write the ticket number in the commit log...

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66578 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-26 14:42:07 +00:00
mame 2b21744efa string.c: remove the deprecation warnings of `String#bytes` with block
And its friends: lines, chars, grapheme_clusters, and codepoints.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-26 08:52:19 +00:00
stomar 31dc65b275 string.c: [DOC] fix typos
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66375 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-12 22:04:48 +00:00
duerst 3628eae2e7 implement special behavior for Georgian for String#capitalize
The modern Georgian script is special in that it has an 'uppercase'
variant called MTAVRULI which can be used for emphasis of whole words,
for screamy headlines, and so on. However, in contrast to all other
bicameral scripts, there is no usage of capitalizing the first letter
in a word or a sentence. Words with mixed capitalization are not used
at all.

We therefore implement special behavior for String#capitalize. Formally,
we define String#capitalize as first applying String#downcase for the
whole string, then using titlecase on the first letter. Because Georgian
defines titlecase as the identity function both for MTAVRULI ('uppercase')
and Mkhedruli (lowercase), this results in String#capitalize being
equivalent to String#downcase for Georgian. This avoids undesirable
mixed case.

* enc/unicode.c: Actual implementation

* string.c: Add mention of this special case for documentation

* test/ruby/enc/test_case_mapping.rb: Add two tests, a general one
  that uses String#capitalize on some (including nonsensical)
  combinations of MTAVRULI and Mkhedruli, and a canary test to
  detect the potential assignment of characters to the currently
  open slots (holes) at U+1CBB and U+1CBC.

* test/ruby/enc/test_case_comprehensive.rb: Tweak generation of
  expectation data.

Together with r65933, this closes issue #14839.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-09 23:14:29 +00:00
naruse e39a83a150 suppress warning: unused variable 'vbits'
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66245 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06 10:42:35 +00:00
nobu 98e65d9d92 Prefer rb_check_arity when 0 or 1 arguments
Especially over checking argc then calling rb_scan_args just to
raise an ArgumentError.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66238 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06 07:49:24 +00:00
shyouhei 37c22bd945 string.c: [DOC] deprecate String#crypt [ci skip] [Feature #14915]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66154 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-03 05:46:46 +00:00
svn 64148e66b6 * expand tabs.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65957 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24 12:26:11 +00:00
naruse 1f9731654e fix r65954; Keep tainty
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65956 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24 12:26:07 +00:00
naruse 7850586af4 Don't use single byte optimization on grapheme clusters
Unicode Text Segmentation considers CRLF as a character. [Bug #15337]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65954 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-24 11:53:19 +00:00
shyouhei 953091a4b1 char is not unsigned
It seems that decades ago, ruby was written under assumption that
char is unsigned.  Which is of course a false assumption.  We
need to explicitly store a numeric value into an unsigned char
variable to tell we expect 0..255 value.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65900 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-21 08:51:39 +00:00
shyouhei 7213568733 string.c: setbyte silently ignores upper bits
The behaviour of String#setbyte has been depending on the width
of int, which is not portable.  Must check explicitly.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65804 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-19 09:52:46 +00:00
shyouhei 74fe1cc3d9 string.c: this assumption is false [ci skip]
Looking at the lines right above, it is clear than a blue sky
that we cannot assume `p` to be aligned at all when
UNALIGNED_WORD_ACCESS is true.  It is a wrong idea to use
__builtin_assume_aligned for that situation.

See also: https://travis-ci.org/ruby/ruby/jobs/451710732#L2007


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65592 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-07 05:23:03 +00:00
shyouhei 4a80c0540f adopt sanitizer API
These APIs are much like <valgrind/memcheck.h>. Use them to
fine-grain annotate the usage of our memory.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65573 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-06 10:06:07 +00:00
ko1 870363886f fix type.
* string.c (rb_str_format_m): should pass `int`.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65456 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-30 22:16:26 +00:00
ko1 312b105d0e introduce TransientHeap. [Bug #14858]
* transient_heap.c, transient_heap.h: implement TransientHeap (theap).
  theap is designed for Ruby's object system. theap is like Eden heap
  on generational GC terminology. theap allocation is very fast because
  it only needs to bump up pointer and deallocation is also fast because
  we don't do anything. However we need to evacuate (Copy GC terminology)
  if theap memory is long-lived. Evacuation logic is needed for each type.

  See [Bug #14858] for details.

* array.c: Now, theap for T_ARRAY is supported.

  ary_heap_alloc() tries to allocate memory area from theap. If this trial
  sccesses, this array has theap ptr and RARRAY_TRANSIENT_FLAG is turned on.
  We don't need to free theap ptr.

* ruby.h: RARRAY_CONST_PTR() returns malloc'ed memory area. It menas that
  if ary is allocated at theap, force evacuation to malloc'ed memory.
  It makes programs slow, but very compatible with current code because
  theap memory can be evacuated (theap memory will be recycled).

  If you want to get transient heap ptr, use RARRAY_CONST_PTR_TRANSIENT()
  instead of RARRAY_CONST_PTR(). If you can't understand when evacuation
  will occur, use RARRAY_CONST_PTR().

(re-commit of r65444)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65449 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-30 21:53:56 +00:00
svn 69b8ffcd5b * expand tabs.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65448 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-30 21:02:12 +00:00
ko1 7d359f9b69 revert r65444 and r65446 because of commit miss
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65447 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-30 21:01:55 +00:00
ko1 90ac549fa6 introduce TransientHeap. [Bug #14858]
* transient_heap.c, transient_heap.h: implement TransientHeap (theap).
  theap is designed for Ruby's object system. theap is like Eden heap
  on generational GC terminology. theap allocation is very fast because
  it only needs to bump up pointer and deallocation is also fast because
  we don't do anything. However we need to evacuate (Copy GC terminology)
  if theap memory is long-lived. Evacuation logic is needed for each type.

  See [Bug #14858] for details.

* array.c: Now, theap for T_ARRAY is supported.

  ary_heap_alloc() tries to allocate memory area from theap. If this trial
  sccesses, this array has theap ptr and RARRAY_TRANSIENT_FLAG is turned on.
  We don't need to free theap ptr.

* ruby.h: RARRAY_CONST_PTR() returns malloc'ed memory area. It menas that
  if ary is allocated at theap, force evacuation to malloc'ed memory.
  It makes programs slow, but very compatible with current code because
  theap memory can be evacuated (theap memory will be recycled).

  If you want to get transient heap ptr, use RARRAY_CONST_PTR_TRANSIENT()
  instead of RARRAY_CONST_PTR(). If you can't understand when evacuation
  will occur, use RARRAY_CONST_PTR().


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65444 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-30 20:46:24 +00:00
stomar 8f0eb44d93 string.c: improve docs for String#strip and related
* string.c: [DOC] improve docs for String#{strip,lstrip,rstrip}{,!}:
  small clarification, avoid referring to the receiver as `str'
  (does not appear in the call-seq of the generated HTML docs),
  enable links for cross-references, simplify rdoc.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65382 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-26 20:30:09 +00:00
stomar af7f9de4b9 array.c, file.c, string.c: [DOC] fix typos
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65185 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-19 21:35:51 +00:00
nobu 569fe2922f string.c: grapheme cluster regexp failure
* string.c (get_reg_grapheme_cluster): show error info and relax
  to rb_fatal from rb_bug.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65096 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16 09:11:12 +00:00
stomar 41a486e966 string.c: [DOC] add example code for String#strip!
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65068 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-13 19:04:02 +00:00
stomar ee49d04540 string.c: small doc improvement
* string.c: [DOC] move unaltered case for String#strip to the end,
  similar to other strip methods.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65067 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-13 19:02:51 +00:00
nobu fa8b08b424 Prefer `rb_fstring_lit` over `rb_fstring_cstr`
The former states explicitly that the argument must be a literal,
and can optimize away `strlen` on all compilers.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65059 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-13 09:59:22 +00:00
nobu 83a01e6f52 Added comments to rb_setup_fake_str and rb_fstring_new [ci skip]
`ptr` for these functions must refer constant string literals.
Otherwise, the result string's content can be modified/discarded
unexpectedly.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65058 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-13 09:23:56 +00:00
marcandre 2521b079fa [DOC] Improve String#strip documentation.
Patch by Josh Goldberg. [Fix GH-1933] [ci skip]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64757 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-09-16 02:49:44 +00:00
shyouhei 22444ae9b1 move function declarations from insns.def to internal.h
Just avoid being loose.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63755 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-06-27 00:57:16 +00:00
stomar 7215cecfb5 string.c: [DOC] grammar fixes
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63632 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-06-11 20:16:27 +00:00
nobu 46d7dc1162 [Docs] Improve documentation of String#lines
* Document about optional getline arguments
* Add examples, especially for the demonstration of `chomp: true`
[Fix GH-1886]

From: Koki Takahashi <hakatasiloving@gmail.com>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63610 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-06-08 10:45:01 +00:00
normal 256411b47f String#uminus dedupes unconditionally
[Feature #14478] [ruby-core:85669]

Thanks-to: Sam Saffron <sam.saffron@gmail.com>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63566 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-06-04 23:26:03 +00:00
nobu ce2f4f8526 string.c: trivial optimizations
* string.c (rb_str_aset): prefer BUILTIN_TYPE over TYPE after
  SPECIAL_CONST_P check.

* string.c (rb_str_start_with): prefer RB_TYPE_P over switch by
  TYPE.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63543 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-06-01 06:53:26 +00:00
nobu 87ccf7e50a string.c: doc for [Feature #13712]
* string.c (rb_str_start_with): [DOC] start_with? example with
  regexp.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63541 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-06-01 06:37:14 +00:00
normal 2fd1525b0f string.c: MAYBE_UNUSED to suppress warnings for `old`
Building with HAVE_MALLOC_USABLE_SIZE currently makes
SIZED_REALLOC_N ignore the old size arg.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63487 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-05-22 01:58:47 +00:00
normal 0a4be5beda string.c: size hints for free and realloc calls
Another part of the plan to reduce dependencies on malloc_usable_size:
https://bugs.ruby-lang.org/issues/10238

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63485 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-05-22 01:13:08 +00:00
nobu 703a5dd3e0 string.c: adjust to rb_str_upto_each
* range.c (range_each_func): adjust the signature of the callback
  function to rb_str_upto_each, and exit the loop if the callback
  returned non-zero.

* string.c (rb_str_upto_endless_each): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63290 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-28 11:16:54 +00:00
nobu 2c8f16e6c0 string.c: fix scanned substring with `\K`
* string.c (scan_once): fix the matched substring with `\K`, the
  beginning of that string may differ from the matched position.
  [ruby-core:86663] [Bug #14707]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63252 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-24 12:25:46 +00:00
mame 7f95eed19e Introduce endless range [Feature#12912]
Typical usages:
```
p ary[1..]          # drop the first element; identical to ary[1..-1]
(1..).each {|n|...} # iterate forever from 1; identical to 1.step{...}
```

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63192 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-19 15:18:50 +00:00
nobu 7c35618c53 string.c: suppress warning
* string.c (str_undump): get rid of warning C4129 by VC.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63170 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-17 04:12:57 +00:00
nobu cea438b0ca string.c: fix dumped suffix
* string.c (rb_str_dump): get rid of an error on evaling with
  frozen-string-literal enabled.  [ruby-core:86539] [Bug #14687]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63164 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-16 07:12:06 +00:00
nobu 5b2b1130cf string.c: fix checking order
* string.c (str_undump): check for suffix before if Unicode escape
  conflicts with it.  the message "but used force_encoding" sounds
  strange when it is not used.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63162 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-16 06:37:42 +00:00
stomar 5e99863393 string.c: [DOC] fix typo
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63160 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-04-14 16:50:42 +00:00
naruse 42f1b58964 Factor out get_reg_grapheme_cluster
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62893 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-22 07:58:39 +00:00
naruse 41b2ef4685 fix each_grapheme_cluster's size [Bug #14363]
From: Hugo Peixoto <hugo.peixoto@gmail.com>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62892 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-22 07:58:38 +00:00
naruse 6e0f5b8407 Revert "each_grapheme_cluster shouldn't return size [Bug #14363]"
This reverts commit r62887.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62891 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-22 07:58:37 +00:00
naruse 613decd088 each_grapheme_cluster shouldn't return size [Bug #14363]
From: Stefan Schüßler <mail@stefanschuessler.de>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62888 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-22 06:59:54 +00:00
nobu 7506fde3e9 Improve documentation for 'text '.split
The documentation didn't mention trailing spaces and the
example only demonstrated the case with leading spaces.
[Fix GH-1845]

From: Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62881 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-21 16:02:26 +00:00
nobu 1bf9dec04c string.c: [DOC] split with block [ci skip]
* string.c (rb_str_split_m): [DOC] about split with block.
  [Feature #4780]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62790 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-17 04:13:26 +00:00
nobu 2258a97fe2 string.c: split with block
* string.c (rb_str_split_m): yield each split substrings if the
  block is given, instead of returing the array.  [Feature #4780]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-15 11:08:04 +00:00
nobu c05fa459bb quote symbols
* sprintf.c (ruby__sfvextra): quote symbols as identifiers.

* string.c (rb_id_quote_unprintable): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62747 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-14 02:35:51 +00:00
k0kubun 288b44328d Export some missing symbols for MJIT
tool/ruby_vm/views/_insn_name_info.erb: on Linux, rb_vm_insn_name_offset
was needed to compile with --jit-debug (Usually --jit-debug requires
more symbols than the situation without --jit-debug because -O2 skips
some functions to compile).

vm.c: when running transform_mjit_header.rb with --jit-wait,
rb_source_location_cstr was repoted to be missing.

string.c: ditto, for rb_str_eql
numeric.c: ditto, for rb_float_eql

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62313 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-08 13:54:37 +00:00
k0kubun ed935aa5be mjit_compile.c: merge initial JIT compiler
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.

This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.

This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).

Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.

I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.

common.mk: update dependencies for mjit_compile.c.

internal.h: declare `rb_vm_insn_addr2insn` for MJIT.

vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.

win32/mkexports.rb: export thread/ec functions, which are used by MJIT.

include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.

array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.

I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.

Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>

Part of [Feature #14235]

---

* Known issues
  * Code generated by gcc is faster than clang. The benchmark may be worse
    in macOS. Following benchmark result is provided by gcc w/ Linux.
  * Performance is decreased when Google Chrome is running
  * JIT can work on MinGW, but it doesn't improve performance at least
    in short running benchmark.
  * Currently it doesn't perform well with Rails. We'll try to fix this
    before release.

---

* Benchmark reslts

Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores

- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option

** Optcarrot fps

Benchmark: https://github.com/mame/optcarrot

|         |2.0.0-p0 |r62186   |JIT off  |JIT on   |
|:--------|:--------|:--------|:--------|:--------|
|fps      |37.32    |51.46    |51.31    |58.88    |
|vs 2.0.0 |1.00x    |1.38x    |1.37x    |1.58x    |

** MJIT benchmarks

Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)

|           |2.0.0-p0 |r62186   |JIT off  |JIT on   |
|:----------|:--------|:--------|:--------|:--------|
|aread      |1.00     |1.09     |1.07     |2.19     |
|aref       |1.00     |1.13     |1.11     |2.22     |
|aset       |1.00     |1.50     |1.45     |2.64     |
|awrite     |1.00     |1.17     |1.13     |2.20     |
|call       |1.00     |1.29     |1.26     |2.02     |
|const2     |1.00     |1.10     |1.10     |2.19     |
|const      |1.00     |1.11     |1.10     |2.19     |
|fannk      |1.00     |1.04     |1.02     |1.00     |
|fib        |1.00     |1.32     |1.31     |1.84     |
|ivread     |1.00     |1.13     |1.12     |2.43     |
|ivwrite    |1.00     |1.23     |1.21     |2.40     |
|mandelbrot |1.00     |1.13     |1.16     |1.28     |
|meteor     |1.00     |2.97     |2.92     |3.17     |
|nbody      |1.00     |1.17     |1.15     |1.49     |
|nest-ntimes|1.00     |1.22     |1.20     |1.39     |
|nest-while |1.00     |1.10     |1.10     |1.37     |
|norm       |1.00     |1.18     |1.16     |1.24     |
|nsvb       |1.00     |1.16     |1.16     |1.17     |
|red-black  |1.00     |1.02     |0.99     |1.12     |
|sieve      |1.00     |1.30     |1.28     |1.62     |
|trees      |1.00     |1.14     |1.13     |1.19     |
|while      |1.00     |1.12     |1.11     |2.41     |

** Discourse's script/bench.rb

Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb

NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
 to fix it. Please wait for the fix.)

*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)

categories_admin:
  50: 17
  75: 18
  90: 22
  99: 29
home_admin:
  50: 21
  75: 21
  90: 27
  99: 40
topic_admin:
  50: 17
  75: 18
  90: 22
  99: 32
categories:
  50: 35
  75: 41
  90: 43
  99: 77
home:
  50: 39
  75: 46
  90: 49
  99: 95
topic:
  50: 46
  75: 52
  90: 56
  99: 101

*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)

categories_admin:
  50: 19
  75: 21
  90: 25
  99: 33
home_admin:
  50: 24
  75: 26
  90: 30
  99: 35
topic_admin:
  50: 19
  75: 20
  90: 25
  99: 30
categories:
  50: 40
  75: 44
  90: 48
  99: 76
home:
  50: 42
  75: 48
  90: 51
  99: 89
topic:
  50: 49
  75: 55
  90: 58
  99: 99

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 11:22:28 +00:00
mame 552a5a993c string.c (rb_str_format_m): Fix the example code of the doc
Change `%08x` to `%016x` because of two reasons:

* `%016x` demonstrates that we can use two or more digits here.
* Currently, many people uses 64-bit environment.
  (I'm unsure if object_id is a good example here, though...)
I'm unsure if

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 08:40:22 +00:00
nobu 9237049efe string.c: clear substring code range
* string.c (str_substr): substring of broken code range string may
  be valid or broken.  patch by tommy (Masahiro Tomita) at
  [ruby-dev:50430] [Bug #14388].

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62040 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-25 13:10:14 +00:00
shyouhei dc1e6f17ba sizeof(uintptr_t) != sizeof(uintptr_t *)
Reported by mame.  Thanks.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61865 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-16 03:09:53 +00:00
shyouhei 39cfa67b4f __builtin_assume_aligned for *(foo *) casts
These casts are guarded. Must be safe to assume alignments.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61829 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-15 02:35:18 +00:00
nobu 6b5e0bd98c exclude flexible array size with old compilers
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61814 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-14 11:19:18 +00:00
mame 982e9e6235 string.c (struct mapping_buffer): Use FLEX_ARY_LEN
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61811 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-13 13:08:05 +00:00
usa b87571100a should cause preprocess error as other cases
* string.c (NONASCII_MASK): should cause preprocess error immediately if the
  compiler does not satisfy our assumptions.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61756 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-10 03:54:02 +00:00
nobu e9cb552ec9 internal.h: remove dependecy on ruby/encoding.h
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61713 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-09 06:24:11 +00:00
nobu ee85a6e72b internal.h: remove dependecy on ruby/io.h
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61712 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-09 06:24:10 +00:00
nobu e043ae7348 string.c: out-of-bounds access
* string.c (rb_str_enumerate_lines): fix out-of-bounds access when
  record separator is longer than the last element.  [Bug #14257]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61636 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-06 08:44:17 +00:00
shyouhei beaf2ace87 ULL suffix is a C99ism
Don't assume long long == 8 bytes.

If you can assume C99, there are macros named UINT64_C and
such for appropriate integer literal suffixes.
If you can't, no way but do a bitwise or.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61594 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-04 07:51:17 +00:00
nobu a1bbb2a780 Fix doc typo in Symbol#to_proc [Fix GH-1785]
[ci skip]

From: Dimitris Zorbas <dimitrisplusplus@gmail.com>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61588 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-04 00:44:40 +00:00
nobu 634a48c5c1 string.c: chomp rs at the end
* string.c (rb_str_enumerate_lines): should chomp record separator
  only, but not a newline, at the end of the receiver as well as
  middle, if the separator is given.
  [ruby-core:84552] [Bug #14257]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61513 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-29 12:19:03 +00:00
kazu c9bdee3e0d [DOC] Fix typos in downcase [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61488 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-27 00:04:30 +00:00
nobu e2479cc43f encoding.c: rb_enc_find_index2
* string.c (str_undump): use rb_enc_find_index2 to find encoding
  by unterminated string.  check the format before encoding name.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61396 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-22 01:03:17 +00:00
nobu 168c019998 string.c: fix memory leak
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61386 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-21 07:59:00 +00:00
naruse 05d1d29d1f Don't allow mixed escape
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61381 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-21 05:09:17 +00:00
naruse 188d85934b move dump format validation into parsing epilogue
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61380 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-21 05:09:16 +00:00
naruse 29c6ca423c fix escapes in undump
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61379 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-21 05:08:57 +00:00
nobu 7c18db61a1 string.c: multiple codepoints
* string.c (undump_after_backslash): fix multiple codepoints in
  braces.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61290 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-16 00:30:52 +00:00
nobu ae18c8f5b6 string.c: suppress warning
* string.c (str_undump): suppress maybe-uninitialized warning by
  gcc 7 and later.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61289 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-16 00:03:51 +00:00
tadd bbec11d329 Implement String#undump to unescape String#dump-ed string
[Feature #12275] [close GH-1765]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61228 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-14 08:47:13 +00:00
nobu a1692f7fdf string.c: fix rb_external_str_new_with_enc
* string.c (rb_external_str_new_with_enc): do not search non-ascii
  by NULL pointer.  [ruby-core:84055] [Bug #14150]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60979 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-02 07:09:16 +00:00
nobu 73e41247b9 string.c: prefer rb_syserr_fail
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60761 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-14 03:02:58 +00:00
rhe a82aaea719 string.c: fix up r60748
An #ifdef was missing in r60748 and build broke on systems without
crypt_r().

https://rubyci.org/logs/rubyci.s3.amazonaws.com/unstable11s/ruby-trunk/log/20171112T162503Z.fail.html.gz

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60749 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-12 17:10:29 +00:00
rhe 0b845a8458 string.c: fix memory leak in String#crypt
Use ALLOCV to allocate struct crypt_data for slightly cleaner and less
error-prone code. It is currently possible it leaks when an invalid
argument is passed to String#crypt or rb_str_new_cstr() fails to
allocate memory.

SIZEOF_CRYPT_DATA macro in missing/crypt.h is removed since it is not
used any longer.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60748 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-12 15:55:04 +00:00
stomar 8b1c1c55a9 string.c: improve docs for String#{concat,<<}
* string.c: [DOC] remove a misleading call-seq for String#concat,
  which suggests that all arguments must be Integers in this case;
  also clarify in the example that the receiver is modified;
  fix grammar for String#<<; move references to the end.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60712 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-07 20:15:59 +00:00
stomar 3262f4910f string.c: fix typos
* string.c: [DOC] fix typos in doxygen comments.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60707 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-07 20:11:09 +00:00
stomar 51b0230a9b string.c: improve docs
* string.c: [DOC] fix rdoc for cross reference; fix grammar.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60574 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-29 21:43:36 +00:00
watson1978 b03a44c4ac string.c: Improve String#prepend performance if only one argument is given
* string.c (rb_str_prepend_multi): Prepend the string without generating
    temporary String object if only one argument is given.
	This is very similar with https://github.com/ruby/ruby/pull/1634

	String#prepend -> 47.5 % up

    [Fix GH-1670] [ruby-core:82195] [Bug #13773]

* Before
      String#prepend      1.517M (± 1.8%) i/s -      7.614M in   5.019819s

* After
      String#prepend      2.236M (± 3.4%) i/s -     11.234M in   5.029716s

* Test code
require 'benchmark/ips'

Benchmark.ips do |x|
  x.report "String#prepend" do |loop|
    loop.times { "!".prepend("hello") }
  end
end

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60480 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-27 14:55:03 +00:00
nobu 2050b50d85 string.c: comment layout [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60331 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-22 00:00:41 +00:00
svn 4b8c94dd84 * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60329 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 23:49:36 +00:00
sonots 84616bf979 * string.c: [DOC] Split rdoc of String#<< and String#concat [ci skip]
Split String#<< and String#concat docs to reflect single and multiple
arguments

patched by MSP-Greg [fix GH-1614]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60328 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 23:49:35 +00:00
sonots d9e11970f8 * string.c: Remove errant "the" in gsub documentation
patched by jlmuir (J. Lewis Muir) [fix GH-1679]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60324 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 23:35:40 +00:00
nobu 80c50308f9 Improve performance of string interpolation
This patch will add pre-allocation in string interpolation.
By this, unecessary capacity resizing is avoided.

For small strings, optimized `rb_str_resurrect` operation is
faster, so pre-allocation is done only when concatenated strings
are large.  `MIN_PRE_ALLOC_SIZE` was decided by experimenting with
local machine (x86_64-apple-darwin 16.5.0, Apple LLVM version
8.1.0 (clang - 802.0.42)).

String interpolation will be faster around 72% when large string is created.

* Before
  ```
  Calculating -------------------------------------
  Large string interpolation
                            1.276M (± 5.9%) i/s -      6.358M in   5.002022s
  Small string interpolation
                            5.156M (± 5.5%) i/s -     25.728M in   5.005731s
  ```

* After
  ```
  Calculating -------------------------------------
  Large string interpolation
                            2.201M (± 5.8%) i/s -     11.063M in   5.043724s
  Small string interpolation
                            5.192M (± 5.7%) i/s -     25.971M in   5.020516s
  ```

* Test code
  ```ruby
  require 'benchmark/ips'

  Benchmark.ips do |x|
    x.report "Large string interpolation" do |t|
      a = "Hellooooooooooooooooooooooooooooooooooooooooooooooooooo"
      b = "Wooooooooooooooooooooooooooooooooooooooooooooooooooorld"

      t.times do
        "#{a}, #{b}!"
      end
    end

    x.report "Small string interpolation" do |t|
      a = "Hello"
      b = "World"

      t.times do
        "#{a}, #{b}!"
      end
    end
  end
  ```

[Fix GH-1626]
From: Nao Minami <south37777@gmail.com>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60320 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 23:21:05 +00:00
hsbt 2c27e52f8e Add documentation for `chomp` option.
https://github.com/ruby/ruby/pull/1717

  Patch by @ksss [fix GH-1717]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60308 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 16:11:58 +00:00
sonots ec30bc5930 * string.c (deleted_prefix_length, deleted_suffix_length):
Add doxygen comment.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60254 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 10:33:25 +00:00
naruse 6187b0001b [Feature #13712] String#start_with? supports regexp
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60234 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 06:51:01 +00:00
glass 8320be1007 string.c: avoid unnecessary call of str_strlen()
* string.c (rb_strseq_index): refactor and avoid
  call of str_strlen() when offset == 0.
  it will improve performance of String#index and #include?

* benchmark/bm_string_index.rb: benchmark for this change

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60086 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-01 13:44:49 +00:00
nobu 16759238ad string.c: fix ASCII-only on succ
* string.c (str_succ): clear coderange cache when no alpha-numeric
  character case, carried part may become ASCII-only.
  [ruby-core:83062] [Bug #13952]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60066 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-30 00:01:23 +00:00
nobu 2d42119903 string.c: ASCII-incompatible is not ASCII only
* string.c (tr_trans): ASCII-incompatible encoding strings cannot
  be ASCII-only even if valid.  [ruby-core:83056] [Bug #13950]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60060 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-29 08:15:50 +00:00
nobu 8c59fdb8d8 dup String#split return value
* string.c (rb_str_split): return duplicated receiver, when no
  splits.  patched by tompng (tomoya ishida) in [ruby-core:82911],
  and the test case by Seiei Miyagi <hanachin@gmail.com>.
  [Bug#13925] [Fix GH-1705]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60002 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-23 07:09:07 +00:00
nobu e1be1d0c38 dup String#rpartition return value
* string.c (rb_str_rpartition): return duplicated receiver, when
  no splits.  [ruby-core:82911] [Bug#13925]

Author:    Seiei Miyagi <hanachin@gmail.com>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60001 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-23 07:09:06 +00:00
nobu b0326bce01 dup String#partition return value
* string.c (rb_str_partition): return duplicated receiver, when no
  splits.  [ruby-core:82911] [Bug#13925]

Author:    Seiei Miyagi <hanachin@gmail.com>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60000 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-23 07:09:05 +00:00
nobu b2da3824c5 refinements in string interpolation
* compile.c (iseq_compile_each0): insert to_s method call, so that
  refinements activated at the caller should take place.
  [Feature #13812]

* insns.def (tostring): fix up converted object to a string,
  infect and fallback.

* insns.def (branchiftype): new instruction for conversion.
  branches if TOS is an instance of the given type.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59950 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-18 02:27:13 +00:00
kazu 0f25c6d7d5 Fix a typo [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59764 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-06 13:46:31 +00:00
nobu bd10ce165c string.c: fix false coderange
* string.c (rb_enc_str_scrub): enc can differ from the actual
  encoding of the string, the cached coderange is useless then.
  [ruby-core:82674] [Bug #13874]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-06 13:11:44 +00:00