This change:
* Added an explanation about back references except \n and \k<n>
(\` \& \' \+ \0)
* Added an explanation about an escape (\\)
* Added some rdoc references
* Rephrased and clarified the reason why double escape is needed, added
some examples, and moved the note to the last (because it is not
specific to the method itself).
rb_fstring behavior in this case is to freeze the receiver. I'm
not sure if that should be changed, so this takes the conservative
approach of duping the receiver in String#-@ before passing
to rb_fstring.
Fixes [Bug #15926]
* string.c (get_reg_grapheme_cluster): make regexp from properly
encoded sources fro wide-char encodings. [Bug #15965]
* regparse.c (node_extended_grapheme_cluster): suppress false
duplicated range warning for the time being.
When a string is #frozen, it's capacity is resized to fit (if it is much
larger), since we know it will no longer be mutated.
> puts ObjectSpace.dump(String.new("a"*30, capacity: 1000))
{"type":"STRING", "class":"0x7feaf00b7bf0", "bytesize":30, "capacity":1000, "value":"...
> puts ObjectSpace.dump(String.new("a"*30, capacity: 1000).freeze)
{"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "bytesize":30, "value":"...
(ObjectSpace.dump doesn't show capacity if capacity is equal to bytesize)
Previously, if we dedup into an fstring, using String#-@, capacity would
not be reduced.
> puts ObjectSpace.dump(-String.new("a"*30, capacity: 1000))
{"type":"STRING", "class":"0x7feaf00b7bf0", "frozen":true, "fstring":true, "bytesize":30, "capacity":1000, "value":"...
This commit makes rb_fstring call rb_str_resize, the same as
rb_str_freeze does.
Closes: https://github.com/ruby/ruby/pull/2256
Registering a string that depend on a dependent string as fstring
can lead to use-after-free. See c06ddfe and 3f95620 for details.
The following script triggers use-after-free on trunk, 2.4.6, 2.5.5
and 2.6.3. Credits to @wanabe for using eval as a cross-version way
of registering a fstring.
```ruby
a = ('j' * 24).b.b
eval('', binding, a)
p a
4.times { GC.start }
p a
```
- string.c (str_replace_shared_without_enc): when given a
dependent string, depend on the root of the dependent
string.
[Bug #15934]
* string.c (str_replace_shared_without_enc): free previous buffer
before replaced.
* parse.y (gettable): make sure in advance that the `__FILE__`
object shares a fstring, to get rid of replacement with the
fstring later.
TODO: this hack may be needed in other places.
[Bug #15916]
Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
This is a follow up for 3f9562015e.
Before this commit, it was possible to create a shared string which
shares with another shared string by passing a frozen shared string
to `str_duplicate`.
Such string looks like:
```
-------- -----------------
| root | ------ owns -----> | root's buffer |
-------- -----------------
^ ^ ^
----------- | |
| shared1 | ------ references ----- |
----------- |
^ |
----------- |
| shared2 | ------ references ---------
-----------
```
This is bad news because `rb_fstring(shared2)` can make `shared1`
independent, which severs the reference from `shared1` to `root`:
```c
/* from fstr_update_callback() */
str = str_new_frozen(rb_cString, shared2); /* can return shared1 */
if (STR_SHARED_P(str)) { /* shared1 is also a shared string */
str_make_independent(str); /* no frozen check */
}
```
If `shared1` was the only reference to `root`, then `root` can be
reclaimed by the GC, leaving `shared2` in a corrupted state:
```
----------- --------------------
| shared1 | -------- owns --------> | shared1's buffer |
----------- --------------------
^
|
----------- -------------------------
| shared2 | ------ references ----> | root's buffer (freed) |
----------- -------------------------
```
Here is a reproduction script for the situation this commit fixes.
```ruby
a = ('a' * 24).strip.freeze.strip
-a
p a
4.times { GC.start }
p a
```
- string.c (str_duplicate): always share with the root string when
the original is a shared string.
- test_rb_str_dup.rb: specifically test `rb_str_dup` to make
sure it does not try to share with a shared string.
[Bug #15792]
Closes: https://github.com/ruby/ruby/pull/2159
* string.c (str_duplicate): share the root shared string if the
original string is already sharing, so that all shared strings
refer the root shared string directly. indirect sharing can
cause a dangling pointer.
[Bug #15792]
* string.c (rb_str_split_m): warn use of non-nil $;.
* string.c (rb_fs_setter): warn when set to non-nil value.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67603 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: remove <code> markups, which are not only unnecessary
but also prevented cross-references.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67311 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_crypt): fix indent not to make the whole list
verbatim entirely.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67310 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_enc_str_coderange): respect the actual encoding of
if a BOM presents, and scan for the actual code range.
[ruby-core:91662] [Bug #15635]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67167 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Officially states that String#dump is intended for round-trip.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66894 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* eval_error.c (print_errinfo): defer escaping control char in
error messages until writing to stderr, instead of quoting at
building the message. [ruby-core:90853] [Bug #15497]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66753 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
And its friends: lines, chars, grapheme_clusters, and codepoints.
[Feature #6670] [ruby-core:90728]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66579 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
And its friends: lines, chars, grapheme_clusters, and codepoints.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The modern Georgian script is special in that it has an 'uppercase'
variant called MTAVRULI which can be used for emphasis of whole words,
for screamy headlines, and so on. However, in contrast to all other
bicameral scripts, there is no usage of capitalizing the first letter
in a word or a sentence. Words with mixed capitalization are not used
at all.
We therefore implement special behavior for String#capitalize. Formally,
we define String#capitalize as first applying String#downcase for the
whole string, then using titlecase on the first letter. Because Georgian
defines titlecase as the identity function both for MTAVRULI ('uppercase')
and Mkhedruli (lowercase), this results in String#capitalize being
equivalent to String#downcase for Georgian. This avoids undesirable
mixed case.
* enc/unicode.c: Actual implementation
* string.c: Add mention of this special case for documentation
* test/ruby/enc/test_case_mapping.rb: Add two tests, a general one
that uses String#capitalize on some (including nonsensical)
combinations of MTAVRULI and Mkhedruli, and a canary test to
detect the potential assignment of characters to the currently
open slots (holes) at U+1CBB and U+1CBC.
* test/ruby/enc/test_case_comprehensive.rb: Tweak generation of
expectation data.
Together with r65933, this closes issue #14839.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Especially over checking argc then calling rb_scan_args just to
raise an ArgumentError.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66238 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Unicode Text Segmentation considers CRLF as a character. [Bug #15337]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65954 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
It seems that decades ago, ruby was written under assumption that
char is unsigned. Which is of course a false assumption. We
need to explicitly store a numeric value into an unsigned char
variable to tell we expect 0..255 value.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65900 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The behaviour of String#setbyte has been depending on the width
of int, which is not portable. Must check explicitly.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65804 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Looking at the lines right above, it is clear than a blue sky
that we cannot assume `p` to be aligned at all when
UNALIGNED_WORD_ACCESS is true. It is a wrong idea to use
__builtin_assume_aligned for that situation.
See also: https://travis-ci.org/ruby/ruby/jobs/451710732#L2007
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65592 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
These APIs are much like <valgrind/memcheck.h>. Use them to
fine-grain annotate the usage of our memory.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65573 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transient_heap.c, transient_heap.h: implement TransientHeap (theap).
theap is designed for Ruby's object system. theap is like Eden heap
on generational GC terminology. theap allocation is very fast because
it only needs to bump up pointer and deallocation is also fast because
we don't do anything. However we need to evacuate (Copy GC terminology)
if theap memory is long-lived. Evacuation logic is needed for each type.
See [Bug #14858] for details.
* array.c: Now, theap for T_ARRAY is supported.
ary_heap_alloc() tries to allocate memory area from theap. If this trial
sccesses, this array has theap ptr and RARRAY_TRANSIENT_FLAG is turned on.
We don't need to free theap ptr.
* ruby.h: RARRAY_CONST_PTR() returns malloc'ed memory area. It menas that
if ary is allocated at theap, force evacuation to malloc'ed memory.
It makes programs slow, but very compatible with current code because
theap memory can be evacuated (theap memory will be recycled).
If you want to get transient heap ptr, use RARRAY_CONST_PTR_TRANSIENT()
instead of RARRAY_CONST_PTR(). If you can't understand when evacuation
will occur, use RARRAY_CONST_PTR().
(re-commit of r65444)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65449 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transient_heap.c, transient_heap.h: implement TransientHeap (theap).
theap is designed for Ruby's object system. theap is like Eden heap
on generational GC terminology. theap allocation is very fast because
it only needs to bump up pointer and deallocation is also fast because
we don't do anything. However we need to evacuate (Copy GC terminology)
if theap memory is long-lived. Evacuation logic is needed for each type.
See [Bug #14858] for details.
* array.c: Now, theap for T_ARRAY is supported.
ary_heap_alloc() tries to allocate memory area from theap. If this trial
sccesses, this array has theap ptr and RARRAY_TRANSIENT_FLAG is turned on.
We don't need to free theap ptr.
* ruby.h: RARRAY_CONST_PTR() returns malloc'ed memory area. It menas that
if ary is allocated at theap, force evacuation to malloc'ed memory.
It makes programs slow, but very compatible with current code because
theap memory can be evacuated (theap memory will be recycled).
If you want to get transient heap ptr, use RARRAY_CONST_PTR_TRANSIENT()
instead of RARRAY_CONST_PTR(). If you can't understand when evacuation
will occur, use RARRAY_CONST_PTR().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65444 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] improve docs for String#{strip,lstrip,rstrip}{,!}:
small clarification, avoid referring to the receiver as `str'
(does not appear in the call-seq of the generated HTML docs),
enable links for cross-references, simplify rdoc.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65382 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (get_reg_grapheme_cluster): show error info and relax
to rb_fatal from rb_bug.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65096 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] move unaltered case for String#strip to the end,
similar to other strip methods.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65067 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The former states explicitly that the argument must be a literal,
and can optimize away `strlen` on all compilers.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65059 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
`ptr` for these functions must refer constant string literals.
Otherwise, the result string's content can be modified/discarded
unexpectedly.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65058 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Document about optional getline arguments
* Add examples, especially for the demonstration of `chomp: true`
[Fix GH-1886]
From: Koki Takahashi <hakatasiloving@gmail.com>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63610 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_aset): prefer BUILTIN_TYPE over TYPE after
SPECIAL_CONST_P check.
* string.c (rb_str_start_with): prefer RB_TYPE_P over switch by
TYPE.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63543 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Building with HAVE_MALLOC_USABLE_SIZE currently makes
SIZED_REALLOC_N ignore the old size arg.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63487 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Another part of the plan to reduce dependencies on malloc_usable_size:
https://bugs.ruby-lang.org/issues/10238
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63485 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* range.c (range_each_func): adjust the signature of the callback
function to rb_str_upto_each, and exit the loop if the callback
returned non-zero.
* string.c (rb_str_upto_endless_each): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63290 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (scan_once): fix the matched substring with `\K`, the
beginning of that string may differ from the matched position.
[ruby-core:86663] [Bug #14707]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63252 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Typical usages:
```
p ary[1..] # drop the first element; identical to ary[1..-1]
(1..).each {|n|...} # iterate forever from 1; identical to 1.step{...}
```
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63192 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_dump): get rid of an error on evaling with
frozen-string-literal enabled. [ruby-core:86539] [Bug #14687]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63164 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (str_undump): check for suffix before if Unicode escape
conflicts with it. the message "but used force_encoding" sounds
strange when it is not used.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63162 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The documentation didn't mention trailing spaces and the
example only demonstrated the case with leading spaces.
[Fix GH-1845]
From: Rodrigo Rosenfeld Rosas <rr.rosas@gmail.com>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62881 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_split_m): yield each split substrings if the
block is given, instead of returing the array. [Feature #4780]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
tool/ruby_vm/views/_insn_name_info.erb: on Linux, rb_vm_insn_name_offset
was needed to compile with --jit-debug (Usually --jit-debug requires
more symbols than the situation without --jit-debug because -O2 skips
some functions to compile).
vm.c: when running transform_mjit_header.rb with --jit-wait,
rb_source_location_cstr was repoted to be missing.
string.c: ditto, for rb_str_eql
numeric.c: ditto, for rb_float_eql
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62313 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Change `%08x` to `%016x` because of two reasons:
* `%016x` demonstrates that we can use two or more digits here.
* Currently, many people uses 64-bit environment.
(I'm unsure if object_id is a good example here, though...)
I'm unsure if
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (str_substr): substring of broken code range string may
be valid or broken. patch by tommy (Masahiro Tomita) at
[ruby-dev:50430] [Bug #14388].
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62040 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
These casts are guarded. Must be safe to assume alignments.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61829 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (NONASCII_MASK): should cause preprocess error immediately if the
compiler does not satisfy our assumptions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61756 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_enumerate_lines): fix out-of-bounds access when
record separator is longer than the last element. [Bug #14257]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61636 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Don't assume long long == 8 bytes.
If you can assume C99, there are macros named UINT64_C and
such for appropriate integer literal suffixes.
If you can't, no way but do a bitwise or.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61594 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_enumerate_lines): should chomp record separator
only, but not a newline, at the end of the receiver as well as
middle, if the separator is given.
[ruby-core:84552] [Bug #14257]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61513 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (str_undump): use rb_enc_find_index2 to find encoding
by unterminated string. check the format before encoding name.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61396 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Use ALLOCV to allocate struct crypt_data for slightly cleaner and less
error-prone code. It is currently possible it leaks when an invalid
argument is passed to String#crypt or rb_str_new_cstr() fails to
allocate memory.
SIZEOF_CRYPT_DATA macro in missing/crypt.h is removed since it is not
used any longer.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60748 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] remove a misleading call-seq for String#concat,
which suggests that all arguments must be Integers in this case;
also clarify in the example that the receiver is modified;
fix grammar for String#<<; move references to the end.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60712 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_prepend_multi): Prepend the string without generating
temporary String object if only one argument is given.
This is very similar with https://github.com/ruby/ruby/pull/1634
String#prepend -> 47.5 % up
[Fix GH-1670] [ruby-core:82195] [Bug #13773]
* Before
String#prepend 1.517M (± 1.8%) i/s - 7.614M in 5.019819s
* After
String#prepend 2.236M (± 3.4%) i/s - 11.234M in 5.029716s
* Test code
require 'benchmark/ips'
Benchmark.ips do |x|
x.report "String#prepend" do |loop|
loop.times { "!".prepend("hello") }
end
end
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60480 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Split String#<< and String#concat docs to reflect single and multiple
arguments
patched by MSP-Greg [fix GH-1614]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60328 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This patch will add pre-allocation in string interpolation.
By this, unecessary capacity resizing is avoided.
For small strings, optimized `rb_str_resurrect` operation is
faster, so pre-allocation is done only when concatenated strings
are large. `MIN_PRE_ALLOC_SIZE` was decided by experimenting with
local machine (x86_64-apple-darwin 16.5.0, Apple LLVM version
8.1.0 (clang - 802.0.42)).
String interpolation will be faster around 72% when large string is created.
* Before
```
Calculating -------------------------------------
Large string interpolation
1.276M (± 5.9%) i/s - 6.358M in 5.002022s
Small string interpolation
5.156M (± 5.5%) i/s - 25.728M in 5.005731s
```
* After
```
Calculating -------------------------------------
Large string interpolation
2.201M (± 5.8%) i/s - 11.063M in 5.043724s
Small string interpolation
5.192M (± 5.7%) i/s - 25.971M in 5.020516s
```
* Test code
```ruby
require 'benchmark/ips'
Benchmark.ips do |x|
x.report "Large string interpolation" do |t|
a = "Hellooooooooooooooooooooooooooooooooooooooooooooooooooo"
b = "Wooooooooooooooooooooooooooooooooooooooooooooooooooorld"
t.times do
"#{a}, #{b}!"
end
end
x.report "Small string interpolation" do |t|
a = "Hello"
b = "World"
t.times do
"#{a}, #{b}!"
end
end
end
```
[Fix GH-1626]
From: Nao Minami <south37777@gmail.com>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60320 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_strseq_index): refactor and avoid
call of str_strlen() when offset == 0.
it will improve performance of String#index and #include?
* benchmark/bm_string_index.rb: benchmark for this change
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60086 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (str_succ): clear coderange cache when no alpha-numeric
character case, carried part may become ASCII-only.
[ruby-core:83062] [Bug #13952]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60066 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_split): return duplicated receiver, when no
splits. patched by tompng (tomoya ishida) in [ruby-core:82911],
and the test case by Seiei Miyagi <hanachin@gmail.com>.
[Bug#13925] [Fix GH-1705]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60002 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* compile.c (iseq_compile_each0): insert to_s method call, so that
refinements activated at the caller should take place.
[Feature #13812]
* insns.def (tostring): fix up converted object to a string,
infect and fallback.
* insns.def (branchiftype): new instruction for conversion.
branches if TOS is an instance of the given type.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59950 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_enc_str_scrub): enc can differ from the actual
encoding of the string, the cached coderange is useless then.
[ruby-core:82674] [Bug #13874]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e