Граф коммитов

1900 Коммитов

Автор SHA1 Сообщение Дата
Peter Zhu 1da2e7fca3
[Feature #19579] Remove !USE_RVARGC code (#7655)
Remove !USE_RVARGC code

[Feature #19579]

The Variable Width Allocation feature was turned on by default in Ruby
3.2. Since then, we haven't received bug reports or backports to the
non-Variable Width Allocation code paths, so we assume that nobody is
using it. We also don't plan on maintaining the non-Variable Width
Allocation code, so we are going to remove it.
2023-04-04 17:30:06 -04:00
Takashi Kokubun 32e0c97dfa RJIT: Optimize String#bytesize 2023-03-18 23:35:42 -07:00
Takashi Kokubun 233ddfac54 Stop exporting symbols for MJIT 2023-03-06 21:59:23 -08:00
Takashi Kokubun f0218303e0 Optimize String#getbyte 2023-03-05 23:28:59 -08:00
Rômulo Ceccon d78ae78fd7 rb_str_modify_expand: clear the string coderange
[Bug #19468]

b0b9f7201a errornously stopped
clearing the coderange.

Since `rb_str_modify` clears it, `rb_str_modify_expand`
should too.
2023-03-03 15:32:25 +01:00
John Bampton 2f7270c681
Fix spelling (#7389) 2023-02-27 09:56:06 -08:00
Adam Daniels 2535b1819f Symbol#end_with? accepts Strings only
Regular expressions are not supported (same as String#end_with?).
2023-02-27 09:26:17 +09:00
BurdetteLamar 3b239d2480 Remove (newly unneeded) remarks about aliases 2023-02-19 14:26:34 -08:00
zverok 51bb5b23d4 [DOC] Small adjustment for String method docs
* Hide freeze method (no useful docs, same as Object#freeze)
* Add dedup to call-seq of str_uminus
2023-02-19 22:32:52 +02:00
Matt Valentine-House d620855101 Rename rb_str_splice_{0,1} -> rb_str_update_{0,1} 2023-02-09 15:02:26 -05:00
Matt Valentine-House 601b83dcfc Remove alias macro rb_str_splice 2023-02-09 15:02:26 -05:00
Matt Valentine-House 72aba64fff Merge gc.h and internal/gc.h
[Feature #19425]
2023-02-09 10:32:29 -05:00
Jean Boussier c6b90e5e9c Mark "mapping_buffer" as write barrier protected
It doesn't have any reference so it can be marked as protected.
2023-02-03 19:10:42 +01:00
Shugo Maeda cce3960964 [Feature #19314] Add new arguments of String#bytesplice
bytesplice(index, length, str, str_index, str_length) -> string
  bytesplice(range, str, str_range) -> string

In these forms, the content of +self+ is replaced by str.byteslice(str_index, str_length) or str.byteslice(str_range); however the substring of +str+ is not allocated as a new string.
2023-01-20 18:02:37 +09:00
Shugo Maeda f7b72462aa
String#bytesplice should return self
In Feature #19314, we concluded that the return value of String#bytesplice
should be changed from the source string to the receiver, because the source
string is useless and confusing when extra arguments are added.

This change should be included in Ruby 3.2.1.
2023-01-19 17:13:07 +09:00
Matt Valentine-House 8a93e5d01b Use str_enc_copy_direct to improve performance
str_enc_copy_direct copies the string encoding over without checking the
frozen status of the string. Because we know that we're safe here (we
only use this function when interpolating strings on the stack via a
concatstrings instruction) we can safely skip this check
2023-01-13 10:31:35 -05:00
Matt Valentine-House bb5fddd070 Remove MIN_PRE_ALLOC_SIZE from Strings.
This optimisation is no longer helpful now that we use VWA to allocate
strings in larger size pools where they can be embedded.
2023-01-13 10:31:35 -05:00
Peter Zhu bfc887f391 Add str_enc_copy_direct
This commit adds str_enc_copy_direct, which is like str_enc_copy but
does not check the frozen status of str1 and does not check the validity
of the encoding of str2. This makes certain string operations ~5% faster.

```ruby
puts(Benchmark.measure do
  100_000_000.times do
    "a".downcase
  end
end)
```

Before this patch:

```
  7.587598   0.040858   7.628456 (  7.669022)
```

After this patch:

```
  7.133128   0.039809   7.172937 (  7.183124)
```
2023-01-12 09:06:15 -05:00
Peter Zhu 9726736006 Set STR_SHARED_ROOT flag on root of string 2023-01-09 08:49:29 -05:00
Peter Zhu 3be2acfafd Fix re-embedding of strings during compaction
The reference updating code for strings is not re-embedding strings
because the code is incorrectly wrapped inside of a
`if (STR_SHARED_P(obj))` clause. Shared strings can't be re-embedded
so this ends up being a no-op. This means that strings can be moved to a
large size pool during compaction, but won't be re-embedded, which would
waste the space.
2023-01-09 08:49:29 -05:00
Peter Zhu d8ef0a98c6 [Bug #19319] Fix crash in rb_str_casemap
The following code crashes on my machine:

```
GC.stress = true

str = "testing testing testing"

puts str.capitalize
```

We need to ensure that the object `buffer_anchor` remains on the stack
so it does not get GC'd.
2023-01-06 11:36:28 -05:00
Nobuyoshi Nakada 98fbebf110
[DOC] Fix typo 2022-12-22 00:01:18 +09:00
S-H-GAMELINKS 1a64d45c67 Introduce encoding check macro 2022-12-02 01:31:27 +09:00
Jeremy Evans 571d21fd4a Make String#rstrip{,!} raise Encoding::CompatibilityError for broken coderange
It's questionable whether we want to allow rstrip to work for strings
where the broken coderange occurs before the trailing whitespace and
not after, but this approach is probably simpler, and I don't think
users should expect string operations like rstrip to work on broken
strings.

In some cases, this changes rstrip to raise
Encoding::CompatibilityError instead of ArgumentError.  However, as
the problem is related to an encoding issue in the receiver, and due
not due to an issue with an argument, I think
Encoding::CompatibilityError is the more appropriate error.

Fixes [Bug #18931]
2022-11-24 18:24:42 -08:00
S-H-GAMELINKS 1f4f6c9832 Using UNDEF_P macro 2022-11-16 18:58:33 +09:00
Takashi Kokubun e7443dbbca
Rewrite Symbol#to_sym and #intern in Ruby (#6683) 2022-11-15 21:34:30 -08:00
Peter Zhu 710c1ada84 Use string's capacity to determine if reembeddable
During auto-compaction, using length to determine whether or not a
string can be re-embedded may be a problem for newly created strings.
This is because usually it requires a malloc before setting the length.
If the malloc triggers compaction, then the string may be re-embedded
and can cause crashes.
2022-11-14 16:59:43 -05:00
Peter Zhu 0468136a1b Make str_alloc_heap return a STR_NOEMBED string
This commit refactors str_alloc_heap to return a string with the
STR_NOEMBED flag set.
2022-11-03 09:09:11 -04:00
Vaevictusnet 7726f6bfff Correcting example for swapcase! method
Example, line 3, swapcase! was incorrect. implied that the swapcase! did /not/ change the starting string.
2022-10-04 10:07:01 +09:00
Peter Zhu 28a572f8bf Fix bug when slicing a string with broken encoding
Commit aa2a428 introduced a bug where non-embedded string slices copied
the encoding of the original string. If the original string had a broken
encoding but the slice has valid encoding, then the slice would be
incorrectly marked as broken encoding.
2022-09-28 09:05:23 -04:00
Peter Zhu 6f8d17e43c Make string slices views rather than copies
Just like commit 1c16645 for arrays, this commit changes string slices
to be a view rather than a copy even if it can be allocated through VWA.
2022-09-28 09:05:23 -04:00
Peter Zhu aa2a428cfb Refactor str_substr and str_subseq
This commit extracts common code between str_substr and rb_str_subseq
into a function called str_subseq.

This commit also applies optimizations in commit 2e88bca to
rb_str_subseq.
2022-09-26 14:54:32 -04:00
Jean Boussier 2e88bca24f string.c: don't create a frozen copy for str_new_shared
str_new_shared already has all the necessary logic to do this
and is also smart enough to skip this step if the source string
is already a shared string itself.

This saves a useless String allocation on each call.
2022-09-26 13:41:17 +02:00
Kazuki Yamaguchi 5b0396473b Fix coderange calculation in String#b
Leave the new coderange unknown if the original encoding is not
ASCII-compatible. Non-ASCII-compatible encoding strings with valid or
broken coderange can end up as ascii-only.

Fixes 9a8f6e392f ("Cheaply derive code range for String#b return
value", 2022-07-25).
2022-09-26 16:44:46 +09:00
Yusuke Endoh a78c733cc3 Revert "Revert "error.c: Let Exception#inspect inspect its message""
This reverts commit b9f030954a.

[Bug #18170]
2022-09-23 16:40:59 +09:00
Benoit Daloze 6525b6f760 Remove get_actual_encoding() and the dynamic endian detection for dummy UTF-16/UTF-32
* And simplify callers of get_actual_encoding().
* See [Feature #18949].
* See https://github.com/ruby/ruby/pull/6322#issuecomment-1242758474
2022-09-12 14:02:34 +02:00
Kazuki Yamaguchi aff6534e32 Avoid unnecessary copying when removing the leading part of a string
Remove the superfluous str_modify_keep_cr() call from rb_str_update().
It ends up calling either rb_str_drop_bytes() or rb_str_splice_0(),
which already does checks if necessary.

The extra call makes the string "independent". This is not always
wanted, in other words, it can keep the same shared root when merely
removing the leading part of a shared string.
2022-09-09 16:03:20 +09:00
Jean Boussier cd1724bdde rb_str_concat_literals: use rb_str_buf_append
That's about 1.30x faster.
2022-09-08 15:02:21 +02:00
Nobuyoshi Nakada 332d29df53
[DOC] non-positive `base` in `Kernel#Integer` and `String#to_i` 2022-09-08 11:52:16 +09:00
Nobuyoshi Nakada 576bdec03f [Bug #18973] Promote US-ASCII to ASCII-8BIT when adding 8-bit char 2022-08-31 17:27:59 +09:00
Nobuyoshi Nakada fe4dd18db4
[DOC] Fix a typo [ci skip] 2022-08-27 12:54:42 +09:00
Nobuyoshi Nakada 43e8d9a050 Check if encoding capable object before check if ASCII compatible 2022-08-20 10:06:40 +09:00
Jean Boussier b0b9f7201a rb_str_resize: Only clear coderange on truncation
If we are expanding the string or only stripping extra capacity
then coderange won't change, so clearing it is wasteful.
2022-08-18 10:09:08 +02:00
Jeremy Evans 49517b3bb4 Fix inspect for unicode codepoint 0x85
This is an inelegant hack, by manually checking for this specific
code point in rb_str_inspect.  Some testing indicates that this is
the only code point affected.

It's possible a better fix would be inside of lower-level encoding
code, such that rb_enc_isprint would return false and not true for
codepoint 0x85.

Fixes [Bug #16842]
2022-08-11 08:47:29 -07:00
Nobuyoshi Nakada 2d1cf658ee
Adjust indent [ci skip] 2022-07-26 18:33:21 +09:00
Kevin Menard 9a8f6e392f Cheaply derive code range for String#b return value
The result of String#b is a string with an ASCII_8BIT/BINARY encoding. That encoding is ASCII-compatible and has no byte sequences that are invalid for the encoding. If we know the receiver's code range, we can derive the resulting string's code range without needing to perform a full code range scan.
2022-07-26 09:03:44 +02:00
Jean Boussier 31a5586d1e rb_str_buf_append: add a fast path for ENC_CODERANGE_VALID
If the RHS has valid encoding, and both strings have the same
encoding, we can use the fast path.

However we need to update the LHS coderange.

```
compare-ruby: ruby 3.2.0dev (2022-07-21T14:46:32Z master cdbb9b8555) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-25T07:25:41Z string-concat-vali.. 11a2772bdd) [arm64-darwin21]
warming up...

|                    |compare-ruby|built-ruby|
|:-------------------|-----------:|---------:|
|binary_concat_7bit  |    554.816k|  556.460k|
|                    |           -|     1.00x|
|utf8_concat_7bit    |    556.367k|  555.101k|
|                    |       1.00x|         -|
|utf8_concat_UTF8    |    412.555k|  556.824k|
|                    |           -|     1.35x|
```
2022-07-25 14:18:52 +02:00
Takashi Kokubun 5b21e94beb Expand tabs [ci skip]
[Misc #18891]
2022-07-21 09:42:04 -07:00
Jeremy Evans 423b41cba7 Make String#each_line work correctly with paragraph separator and chomp
Previously, it was including one newline when chomp was used,
which is inconsistent with IO#each_line behavior. This makes
behavior consistent with IO#each_line, chomping all paragraph
separators (multiple consecutive newlines), but not single
newlines.

Partially Fixes [Bug #18768]
2022-07-21 08:02:32 -07:00
Jean Boussier f954c5dae4 string.c: use str_enc_fastpath in TERM_LEN
Not having to fetch the rb_encoding save a significant
amount of time.

Additionally, even when we have to fetch it, we can do
it faster using `ENCODING_GET` rather than `rb_enc_get`.

```
compare-ruby: ruby 3.2.0dev (2022-07-19T08:41:40Z master cb9fd920a3) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-21T11:16:16Z faster-buffer-conc.. 4f001f0748) [arm64-darwin21]
warming up...

|                      |compare-ruby|built-ruby|
|:---------------------|-----------:|---------:|
|binary_concat_utf8    |    510.580k|  565.600k|
|                      |           -|     1.11x|
|binary_concat_binary  |    512.653k|  571.483k|
|                      |           -|     1.11x|
|utf8_concat_utf8      |    511.396k|  566.879k|
|                      |           -|     1.11x|
```
2022-07-21 15:06:50 +02:00
Jean Boussier cb9fd920a3 str_buf_cat: preserve coderange when going through fastpath
rb_str_modify clear the coderange, which in this case isn't
necessary.

```
compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-19T07:17:01Z faster-buffer-conc.. 3cad62aab4) [arm64-darwin21]
warming up...

|                      |compare-ruby|built-ruby|
|:---------------------|-----------:|---------:|
|binary_concat_utf8    |    360.617k|  605.091k|
|                      |           -|     1.68x|
|binary_concat_binary  |    446.650k|  605.053k|
|                      |           -|     1.35x|
|utf8_concat_utf8      |    454.166k|  597.311k|
|                      |           -|     1.32x|
```

```
|            |compare-ruby|built-ruby|
|:-----------|-----------:|---------:|
|erb_render  |      1.790M|    2.045M|
|            |           -|     1.14x|
```
2022-07-19 10:41:40 +02:00
Jean Boussier 0ae8dbbee0 rb_str_buf_append: fastpath to str_buf_cat
If the LHS is ASCII compatible and the RHS is 7BIT
we can directly concat without being concerned about
anything else.

Benchmark:
```
compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-13T10:13:53Z faster-buffer-conc.. a04c10476d) [arm64-darwin21]
warming up...

|                      |compare-ruby|built-ruby|
|:---------------------|-----------:|---------:|
|binary_append_utf8    |    385.315k|  573.663k|
|                      |           -|     1.49x|
|binary_append_binary  |    446.579k|  574.898k|
|                      |           -|     1.29x|
|utf8_append_utf8      |    430.936k|  573.394k|
|                      |           -|     1.33x|
```

Note that in the benchmark, the RHS always have a precomputed
coderange. So the benchmark never enter the slowpath of having to
scan the RHS. However it's extremly likely that we'll end
up scanning it anyway in rb_enc_cr_str_buf_cat
2022-07-19 10:41:40 +02:00
Jean Boussier d084585f01 Rename ENCINDEX_ASCII to ENCINDEX_ASCII_8BIT
Otherwise it's way too easy to confuse it with US_ASCII.
2022-07-19 08:48:56 +02:00
Burdette Lamar 081bd061a8
[DOC] Correct call-seq directive in string.c (#6131)
Correct call-seq directive in string.c
2022-07-13 10:44:22 -05:00
S-H-GAMELINKS 420f3ced4d Using is_ascii_string to check encoding 2022-06-17 12:02:50 +09:00
Alan Wu 714a4942fd
Remove unused and accidentally public rb_str_shared_root_p()
This function was added to a public header in [1] probably
unintentionally since it's not used anywhere, exposes implementation
details, and isn't related to the goals of that pull request.

[1]: 56cc3e99b6
2022-06-16 07:20:20 -04:00
Nobuyoshi Nakada 048f14221c
Add placeholder to let braces match 2022-06-14 10:21:55 +09:00
Matt Valentine-House 56cc3e99b6 Move String RVALUES between pools
And re-embed any strings that can now fit inside the slot they've been
moved to
2022-06-13 10:11:27 -07:00
Alexander Ilyin adcfd69690
[DOC] Fix markup for `String` (#5984)
* Add missing space for `String#start_with?`.
* Add missing pluses for `String#tr` and
  `Methods for Converting to New String` label.
* Move quote into the tag for `Whitespace in Strings` label.
2022-06-09 13:40:21 -05:00
Yusuke Endoh b9f030954a Revert "error.c: Let Exception#inspect inspect its message"
This reverts commit 9d927204e7.
2022-06-07 11:52:44 +09:00
Yusuke Endoh 9d927204e7 error.c: Let Exception#inspect inspect its message
... only when the message string has a newline.

`p StandardError.new("foo\nbar")` now prints `#<StandardError: "foo\nbar">'
instead of:

    #<StandardError:
    bar>

[Bug #18170]
2022-06-07 11:07:09 +09:00
Jean Boussier 65122d09d5 [Feature #18595] Alias String#-@ as String#dedup 2022-05-20 11:31:59 -07:00
Nobuyoshi Nakada 5d45afdbbf
[DOC] Move the documentations of moved Symbol methods 2022-04-14 11:17:37 +09:00
Burdette Lamar dfdc03248f
[DOC] Enhanced RDoc for Symbol (#5796)
Treats:
    #[]
    #length
    #empty?
    #upcase
    #downcase
    #capitalize
    #swapcase
    #start_with?
    #end_with?
    #encoding
    ::all_symbols
2022-04-13 13:45:18 -05:00
Nobuyoshi Nakada 7e97ebb6eb
Enforce literals on the second arguments 2022-04-13 18:33:34 +09:00
Burdette Lamar b21026cb1a
Enhanced RDoc for Symbol (#5795)
Treats:

    #==
    #inspect
    #name
    #to_s
    #to_sym
    #to_proc
    #succ
    #<=>
    #casecmp
    #casecmp?
    #=~
    #match
    #match?
2022-04-12 17:27:18 -05:00
Burdette Lamar 70415071e8
Fix some RDoc links (#5778) 2022-04-08 14:25:38 -05:00
Burdette Lamar 9ca3d537b9
All-in-one RDoc for class String (#5777) 2022-04-07 14:29:04 -05:00
Burdette Lamar 717b20ee30
[DOC] Enhanced RDoc for string slices (#5769)
Creates file doc/string/slices.rdoc that the string slicing methods can link to.
2022-04-06 15:47:22 -05:00
Burdette Lamar 4a4485adbd
Enhanced RDoc for String#index (#5759) 2022-04-04 14:18:10 -05:00
Burdette Lamar 0b0ae583f4
[DOC] Enhanced RDoc for String (#5753)
Treats:
    #length
    #bytesize
2022-04-03 10:09:34 -05:00
Burdette Lamar 7be4d900f0
[DOC] Enhanced RDoc for String (#5751)
Adds to doc for String.new, also making it compliant with documentation_guide.rdoc.
    Fixes some broken links in io.c (that I failed to correct yesterday).
2022-04-02 14:26:49 -05:00
Burdette Lamar 056b7a8633
[DOC] Enhanced RDoc for String (#5742)
Treats:
    #force_encoding
    #b
    #valid_encoding?
    #ascii_only?
    #scrub
    #scrub!
    #unicode_normalized?
Plus a couple of minor tweaks.
2022-03-31 15:09:25 -05:00
Burdette Lamar ffcdbedbfb
Repaired What's Here sections for Range, String, Symbol, Struct (#5735)
Repaired What's Here sections for Range, String, Symbol, Struct.
2022-03-30 13:46:24 -05:00
Burdette Lamar b257034ae5
[DOC] Enhanced RDoc for String (#5730)
Treats:

    #start_with?
    #end_with?
    #delete_prefix
    #delete_prefix!
    #delete_suffix
    #delete_suffix!
2022-03-29 09:54:29 -05:00
Burdette Lamar 5525e47a0b
[DOC] Enhanced RDoc for String (#5726)
Treats:

    #ljust
    #rjust
    #center
    #partition
    #rpartition
2022-03-28 15:49:18 -05:00
Burdette Lamar d52cf1013f
[DOC] Enhanced RDoc for String (#5724)
Treats:

    #scan
    #hex
    #oct
    #crypt
    #ord
    #sum
2022-03-27 14:45:14 -05:00
Nobuyoshi Nakada 1b0f05168d
[DOC] Fix references to unary operator 2022-03-27 11:24:06 +09:00
Burdette Lamar e699e2d9bf
Enhanced RDoc for String (#5723)
Treats:

    #lstrip
    #lstrip!
    #rstrip
    #rstrip!
    #strip
    #strip!

Adds section Whitespace in Strings.
2022-03-26 12:42:44 -05:00
Nobuyoshi Nakada 300f4677c9
[DOC] Use simple references to operator methods
Method references is not only able to be marked up as code, also
reflects `--show-hash` option.
The bug that prevented the old rdoc from correctly parsing these
methods was fixed last month.
2022-03-26 21:13:16 +09:00
Burdette Lamar 465edb96f0
[DOC] Enhanced RDoc for String (#5707)
Treated:

    #chomp
    #chomp!
    #chop
    #chop!
2022-03-24 19:40:58 -05:00
Burdette Lamar 0140e6c41e
[DOC] Enhanced RDoc for String (#5685)
Treats:

    #chars
    #codepoints
    #each_char
    #each_codepoint
    #each_grapheme_cluster
    #grapheme_clusters

Also, corrects a passage in #unicode_normalize that mentioned module UnicodeNormalize, whose doc (:nodoc:, actually) says not to mention it.
2022-03-22 14:51:05 -05:00
Burdette Lamar c129b6119d
[DOC] Use RDoc inclusions in string.c (#5683)
As @peterzhu2118 and @duerst have pointed out, putting string method's RDoc into doc/ (which allows non-ASCII in examples) makes the "click to toggle source" feature not work for that method.

This PR moves the primary method doc back into string.c, then includes RDoc from doc/string/*.rdoc, and also removes doc/string.rdoc.

The affected methods are:

    ::new
    #bytes
    #each_byte
    #each_line
    #split

The call-seq is in string.c because it works there; it did not work when the call-seq is in doc/string/*.rdoc.

This PR also updates the relevant guidance in doc/documentation_guide.rdoc.
2022-03-21 14:58:00 -05:00
Burdette Lamar d52f41b765
[DOC] Enhanced RDoc for String (#5675)
Treats:
    #split
    #each_line
    #lines
    #each_byte
    #bytes
2022-03-18 17:17:00 -05:00
Shugo Maeda 1107839a7f Add String#bytesplice 2022-03-18 11:51:03 +09:00
Burdette Lamar 59a1a8185f
[DOC] Enhanced RDoc for String#split (#5644)
* Enhanced RDoc for String#split

* Enhanced RDoc for String#split

* Enhanced RDoc for String#split

* Enhanced RDoc for String#split

* Enhanced RDoc for String#split
2022-03-16 14:45:48 -05:00
Nobuyoshi Nakada 4d93b6299c
Initialize mutex for crypt(3) statically
Assuming that all platforms, where only `crypt` is available but
not `crypt_r`, are POSIX-base.
2022-03-16 18:51:34 +09:00
Burdette Lamar 561dda9934
[DOC] Enhanced RDoc for String (#5635)
Treats:

    #count
    #delete
    #delete!
    #squeeze
    #squeeze!

Adds section "Multiple Character Selectors" to doc/character_selectors.rdoc.

Co-authored-by: Peter Zhu <peter@peterzhu.ca>
2022-03-09 19:53:51 -06:00
Burdette Lamar 72c038a8f5
[DOC] Enhanced RDoc for String (#5633)
Treats:

    #tr (revised to link to "Character Selectors" document)
    #tr!
    #tr_s
    #tr_s!

Also renames doc/character_selector.rdoc to match its title.
2022-03-09 08:42:12 -06:00
Kazuhiro NISHIYAMA b068a53dc9
[DOC] Fix default offset of String#byterindex 2022-03-09 15:15:11 +09:00
Burdette Lamar faff37da57
[DOC] Enhanced RDoc for String #tr and #tr! (#5626) 2022-03-07 12:58:29 -06:00
Nobuyoshi Nakada 7f7f07a600
[DOC] mark `rb_str_init` as `:nodoc:`
Otherwise, an empty entry will be generated as `String::new` along
with the one from doc/string.rb.
2022-03-03 13:39:07 +09:00
Mau Magnaguagno 347c3faf8e
[DOC] Fix String#getbyte doc
* String#getbyte returns `nil` if `index` is out of range.

* Add String#getbyte example with nil output.

* Modify String#getbyte example to use negative index.
2022-03-01 10:05:49 +09:00
Nobuyoshi Nakada 3e5d7e3176
[DOC] Move String.new to allow non US-ASCII characters 2022-02-26 21:50:46 +09:00
Burdette Lamar 26ffda2fd2
[DOC] Enhanced RDoc for some encoding methods (#5598)
In String, treats:

    #b
    #scrub
    #scrub!
    #unicode_normalize
    #unicode_normalize!
    #encode
    #encode!

Also adds a note to IO.new (suggested by @jeremyevans).
2022-02-25 13:12:59 -06:00
Shugo Maeda 63401b1384
Rename the wrong variable name `beg` to `len` 2022-02-23 11:23:33 +09:00
Nobuyoshi Nakada 8f0e3a97f9
rb_debug_rstring_null_ptr: add newlines in the message [ci skip]
The message should end with a newline, and break the long
paragraph.
2022-02-21 16:22:23 +09:00
Shugo Maeda c8817d6a3e
Add String#byteindex, String#byterindex, and MatchData#byteoffset (#5518)
* Add String#byteindex, String#byterindex, and MatchData#byteoffset [Feature #13110]

Co-authored-by: NARUSE, Yui <naruse@airemix.jp>
2022-02-19 19:10:00 +09:00
Nobuyoshi Nakada 6e65e04186
[DOC] Remove unnecessary `rdoc-ref:` schemes 2022-02-12 12:38:37 +09:00
Nobuyoshi Nakada 50c972a1ae
[DOC] Simplify operator method references 2022-02-12 12:38:36 +09:00
Paarth Madan 2a30ddd9f3 Remove extraneous "." in String#+@ documentation 2022-02-08 10:33:49 +09:00
Nobuyoshi Nakada 8ca7b0b68a
[DOC] Fix broken links to operator methods
Once https://github.com/ruby/rdoc/pull/865 is merged, these hacks
are no longer needed.
2022-02-08 01:39:37 +09:00
Nobuyoshi Nakada 07bf65858d
[DOC] Fix broken links to case_mapping.rdoc 2022-02-08 01:28:08 +09:00
Nobuyoshi Nakada 16fdc1ff46
[DOC] Fix broken links to literals.rdoc 2022-02-08 01:27:52 +09:00
Nobuyoshi Nakada bc5662d9d8
[DOC] Simplify links to global methods 2022-02-08 01:18:56 +09:00
Peter Zhu a32e5e1b97 [DOC] Use RDoc link style for links in the same class/module
I used this regex:

(?<=\[)#(?:class|module)-([A-Za-z]+)-label-([A-Za-z0-9\-\+]+)

And performed a global find & replace for this:

rdoc-ref:$1@$2
2022-02-07 09:52:06 -05:00
Peter Zhu f9a2802bc5 [DOC] Use RDoc link style for links to other classes/modules
I used this regex:

([A-Za-z]+)\.html#(?:class|module)-[A-Za-z]+-label-([A-Za-z0-9\-\+]+)

And performed a global find & replace for this:

rdoc-ref:$1@$2
2022-02-07 09:52:06 -05:00
Burdette Lamar a07fa198a6
Improve links to labels in string.c and struct.c (#5531) 2022-02-06 09:44:40 -06:00
Peter Zhu be68b3a490 Change termlen when changing encoding during concatenation
After changing the encoding, we should update the terminator length.
2022-01-07 10:50:03 -05:00
Nobuyoshi Nakada 3f9af8a9dc
[DOC] Fix typos in a doxygen comment [ci skip] 2022-01-07 23:55:59 +09:00
Peter Zhu ae0d67d762 Revert "Set encoding before concatenating to string"
This reverts commit 44368b5f8b.
2022-01-06 17:23:05 -05:00
Peter Zhu 5f55b03716 Set correct termlen for frozen strings
Frozen strings should have the same termlen as the original string when
copy_encoding is true.
2022-01-06 14:33:35 -05:00
Peter Zhu 44368b5f8b Set encoding before concatenating to string
If we set encoding after the call to rb_str_buf_cat, then rb_str_buf_cat
will not set the correct terminator length.
2022-01-06 14:33:35 -05:00
Nobuyoshi Nakada 39bc5de833
Remove tainted and trusted features
Already these had been announced to be removed in 3.2.
2021-12-26 23:28:54 +09:00
Nobuyoshi Nakada e2ec97c4b8
[DOC] How to get the longest last match [Bug #18415] 2021-12-19 20:27:31 +09:00
Burdette Lamar 5588aa79d4
What's Here for Symbol (#5289)
* What's Here for Symbol
2021-12-17 17:02:12 -06:00
Burdette Lamar f7e266e6d2
Enhanced RDoc for case mapping (#5245)
Adds file doc/case_mapping.rdoc, which describes case mapping and provides a link target that methods doc can link to.

Revises:

    String#capitalize
    String#capitalize!
    String#casecmp
    String#casecmp?
    String#downcase
    String#downcase!
    String#swapcase
    String#swapcase!
    String#upcase
    String#upcase!
    Symbol#capitalize
    Symbol#casecmp
    Symbol#casecmp?
    Symbol#downcase
    Symbol#swapcase
    Symbol#upcase
2021-12-17 06:05:31 -06:00
Burdette Lamar e5ff030f60
Enhanced RDoc for String (#5234)
Treated:

    #to_i
    #to_f
    #to_s
    #inspect
    #dump
    #undump
2021-12-10 10:50:13 -06:00
Burdette Lamar 9a2ecddf32
Enhanced RDoc for String (#5227)
Treats:

    #replace
    #clear
    #chr
    #getbyte
    #setbyte
    #byteslice
    #reverse
    #reverse!
    #include?
2021-12-08 12:29:56 -06:00
Burdette Lamar 7fc9d83bd1
Fix link (#5208) 2021-12-03 10:46:35 -06:00
Burdette Lamar 28fb6d6b9e
Adding links to literals and Kernel (#5192)
* Adding links to literals and Kernel
2021-12-03 07:12:28 -06:00
Peter Zhu 7cfacbcad2 Improve performance of embedded string allocation
Non-VWA embedded string allocation had a performance regression. This
commit improves performance of non-VWA embedded string allocation.
2021-11-26 13:27:32 -05:00
Peter Zhu 9aded89f40 Speed up Ractors for Variable Width Allocation
This commit adds a Ractor cache for every size pool. Previously, all VWA
allocated objects used the slowpath and locked the VM.

On a micro-benchmark that benchmarks String allocation:

VWA turned off:
  29.196591   0.889709  30.086300 (  9.434059)

VWA before this commit:
  29.279486  41.477869  70.757355 ( 12.527379)

VWA after this commit:
  16.782903   0.557117  17.340020 (  4.255603)
2021-11-23 10:51:27 -05:00
Peter Zhu aeae6e2842 [Feature #18290] Remove all usages of rb_gc_force_recycle
This commit removes usages of rb_gc_force_recycle since it is a burden
to maintain and makes changes to the GC difficult.
2021-11-08 14:05:54 -05:00
Yusuke Endoh 4b248e7994 string.c: Follow up to ae2359f602
* Mention `\0`
* Make the example of hash replacement meaningful
2021-11-03 03:52:28 +09:00
Burdette Lamar ae2359f602
Enhanced RDoc for String (#5060)
Treated:

    #slice!
    #sub
    #sub!
    #gsub
    #gsub!
2021-11-02 13:04:58 -05:00
Burdette Lamar 3e743d3147
Cleanup some RDoc (#5050)
Mostly adding blank line before and after code segment, to improve compliance with doc\documentation_guide.rdoc.
2021-10-28 17:01:49 -05:00
Yusuke Endoh acb2f86caa string.c: Add some comments about STR flags 2021-10-29 01:57:29 +09:00
Peter Zhu a5b6598192 [Feature #18239] Implement VWA for strings
This commit adds support for embedded strings with variable capacity and
uses Variable Width Allocation to allocate strings.
2021-10-25 13:26:23 -04:00
Peter Zhu 46b66eb9e8 [Feature #18239] Add struct for embedded strings 2021-10-25 13:26:23 -04:00
Jeremy Evans 2a5c3a4d0f Update documentation for String and Symbol to discuss differences
Implements [Feature #14347]
2021-10-15 13:54:03 -07:00
Nobuyoshi Nakada 78ff9b719c
Add tests for the edge caces of `String#end_with?`
Also, check if a suffix is empty, to guarantee the assumption of
`onigenc_get_left_adjust_char_head` that `*s` is always accessible,
even in the case of `SHARABLE_MIDDLE_SUBSTRING`.
2021-10-08 14:08:03 +09:00
git 1bf3f3f4da * remove trailing spaces. [ci skip] 2021-10-06 00:40:54 +09:00
Jeremy Evans c6706f15af Fix documentation for String#{<<,concat,prepend}
These methods mutate and return the receiver, they don't create
and return a new string.

Fixes [Bug #18241]
2021-10-05 08:39:27 -07:00
Nobuyoshi Nakada cd182c5ee1
Adjust types to rb_enc_left_char_head
I dislike unnatural casts.
2021-10-05 17:14:29 +09:00
Nobuyoshi Nakada 5a961c3768
Remove a redundant cast between the exact same types 2021-10-05 15:56:34 +09:00
卜部昌平 f032c09bca rb_enc_left_char_head(): take void*
Nobu doesn't like (char*) cast.
2021-10-05 14:18:23 +09:00
卜部昌平 499660b04f downcase_single/upcase_single: assume ASCII
These functions assume ASCII compatibility.  That has to be ensured in
their caller.
2021-10-05 14:18:23 +09:00
卜部昌平 5112a54846 include/ruby/encoding.h: convert macros into inline functions
Less macros == huge win.
2021-10-05 14:18:23 +09:00
卜部昌平 e42c8c160d add undeclared variables
Why did they even exist?
2021-10-05 14:18:23 +09:00
Nobuyoshi Nakada 842b0008c1 Skip broken strings as the locale encoding 2021-10-01 20:28:44 +09:00
Kazuhiro NISHIYAMA e0c6e8c64a
[DOC] Use `unpack1` instead of `unpack(template)[0]` [ci skip] 2021-09-23 09:20:00 +09:00
Nobuyoshi Nakada cbbda3e648
Adjust indent in string.c [ci skip] 2021-09-16 23:49:16 +09:00
S.H b8c3a84bdd
Refactor and Using RBOOL macro 2021-09-15 08:11:05 +09:00
Nobuyoshi Nakada cd829bb078 Remove printf family from the mjit header
Linking printf family functions makes mjit objects to link
unnecessary code.
2021-09-11 08:41:32 +09:00
卜部昌平 091faca99c include/ruby/internal/intern/string.h: add doygen
Must not be a bad idea to improve documents. [ci skip]
2021-09-10 20:00:06 +09:00
Peter Zhu 5d81554281 [Bug #18154] Fix memory leak in String#initialize
String#initialize can leak memory when called on a string that is marked
with STR_NOFREE because it does not unset the STR_NOFREE flag.
2021-09-08 10:20:12 -04:00
Nobuyoshi Nakada edf01d4e82
Treat NULL fake string as an empty string
And the NULL string must be of size 0.
2021-08-17 18:45:36 +09:00
Jeremy Evans 84bf4d2ce5 Term fill in String#{,l,r}strip! even when SHARABLE_MIDDLE_SUBSTRING
Each of these methods calls str_modify_keep_cr before
term filling, which should ensure the backing string
uses private memory, and therefore term filling should
not affect other strings.

Skipping the term filling was added in
a707ab4bc8.

Fixes [Bug #12540]
2021-08-11 13:40:49 +09:00
Peter Zhu c463a5e008 Fix indentation in string.c
7 spaces were used for 2 levels of indentation. This commit changes it
to use 8 spaces.
2021-08-03 16:39:02 -04:00