tool/ruby_vm/views/_insn_name_info.erb: on Linux, rb_vm_insn_name_offset
was needed to compile with --jit-debug (Usually --jit-debug requires
more symbols than the situation without --jit-debug because -O2 skips
some functions to compile).
vm.c: when running transform_mjit_header.rb with --jit-wait,
rb_source_location_cstr was repoted to be missing.
string.c: ditto, for rb_str_eql
numeric.c: ditto, for rb_float_eql
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62313 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Change `%08x` to `%016x` because of two reasons:
* `%016x` demonstrates that we can use two or more digits here.
* Currently, many people uses 64-bit environment.
(I'm unsure if object_id is a good example here, though...)
I'm unsure if
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (str_substr): substring of broken code range string may
be valid or broken. patch by tommy (Masahiro Tomita) at
[ruby-dev:50430] [Bug #14388].
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62040 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
These casts are guarded. Must be safe to assume alignments.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61829 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (NONASCII_MASK): should cause preprocess error immediately if the
compiler does not satisfy our assumptions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61756 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_enumerate_lines): fix out-of-bounds access when
record separator is longer than the last element. [Bug #14257]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61636 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Don't assume long long == 8 bytes.
If you can assume C99, there are macros named UINT64_C and
such for appropriate integer literal suffixes.
If you can't, no way but do a bitwise or.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61594 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_enumerate_lines): should chomp record separator
only, but not a newline, at the end of the receiver as well as
middle, if the separator is given.
[ruby-core:84552] [Bug #14257]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61513 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (str_undump): use rb_enc_find_index2 to find encoding
by unterminated string. check the format before encoding name.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61396 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Use ALLOCV to allocate struct crypt_data for slightly cleaner and less
error-prone code. It is currently possible it leaks when an invalid
argument is passed to String#crypt or rb_str_new_cstr() fails to
allocate memory.
SIZEOF_CRYPT_DATA macro in missing/crypt.h is removed since it is not
used any longer.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60748 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] remove a misleading call-seq for String#concat,
which suggests that all arguments must be Integers in this case;
also clarify in the example that the receiver is modified;
fix grammar for String#<<; move references to the end.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60712 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_prepend_multi): Prepend the string without generating
temporary String object if only one argument is given.
This is very similar with https://github.com/ruby/ruby/pull/1634
String#prepend -> 47.5 % up
[Fix GH-1670] [ruby-core:82195] [Bug #13773]
* Before
String#prepend 1.517M (± 1.8%) i/s - 7.614M in 5.019819s
* After
String#prepend 2.236M (± 3.4%) i/s - 11.234M in 5.029716s
* Test code
require 'benchmark/ips'
Benchmark.ips do |x|
x.report "String#prepend" do |loop|
loop.times { "!".prepend("hello") }
end
end
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60480 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Split String#<< and String#concat docs to reflect single and multiple
arguments
patched by MSP-Greg [fix GH-1614]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60328 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This patch will add pre-allocation in string interpolation.
By this, unecessary capacity resizing is avoided.
For small strings, optimized `rb_str_resurrect` operation is
faster, so pre-allocation is done only when concatenated strings
are large. `MIN_PRE_ALLOC_SIZE` was decided by experimenting with
local machine (x86_64-apple-darwin 16.5.0, Apple LLVM version
8.1.0 (clang - 802.0.42)).
String interpolation will be faster around 72% when large string is created.
* Before
```
Calculating -------------------------------------
Large string interpolation
1.276M (± 5.9%) i/s - 6.358M in 5.002022s
Small string interpolation
5.156M (± 5.5%) i/s - 25.728M in 5.005731s
```
* After
```
Calculating -------------------------------------
Large string interpolation
2.201M (± 5.8%) i/s - 11.063M in 5.043724s
Small string interpolation
5.192M (± 5.7%) i/s - 25.971M in 5.020516s
```
* Test code
```ruby
require 'benchmark/ips'
Benchmark.ips do |x|
x.report "Large string interpolation" do |t|
a = "Hellooooooooooooooooooooooooooooooooooooooooooooooooooo"
b = "Wooooooooooooooooooooooooooooooooooooooooooooooooooorld"
t.times do
"#{a}, #{b}!"
end
end
x.report "Small string interpolation" do |t|
a = "Hello"
b = "World"
t.times do
"#{a}, #{b}!"
end
end
end
```
[Fix GH-1626]
From: Nao Minami <south37777@gmail.com>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60320 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_strseq_index): refactor and avoid
call of str_strlen() when offset == 0.
it will improve performance of String#index and #include?
* benchmark/bm_string_index.rb: benchmark for this change
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60086 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (str_succ): clear coderange cache when no alpha-numeric
character case, carried part may become ASCII-only.
[ruby-core:83062] [Bug #13952]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60066 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_split): return duplicated receiver, when no
splits. patched by tompng (tomoya ishida) in [ruby-core:82911],
and the test case by Seiei Miyagi <hanachin@gmail.com>.
[Bug#13925] [Fix GH-1705]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60002 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* compile.c (iseq_compile_each0): insert to_s method call, so that
refinements activated at the caller should take place.
[Feature #13812]
* insns.def (tostring): fix up converted object to a string,
infect and fallback.
* insns.def (branchiftype): new instruction for conversion.
branches if TOS is an instance of the given type.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59950 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_enc_str_scrub): enc can differ from the actual
encoding of the string, the cached coderange is useless then.
[ruby-core:82674] [Bug #13874]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (WANTARRAY): make array for the result in method
functions and pass it to enumerator functions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59736 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (enumerator_wantarray): show warnings at method
functions for proper method names.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59732 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_enumerate_grapheme_clusters): suppress a
maybe-uninitialized warning by old gcc.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59730 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_split_m): fix potential bug when rb_memsearch()
matches a octet in the middle of a multi-byte character sequence.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59673 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_rstrip_bang): improve the performance in 50%
for a string pattern, and in 10% for a regexp pattern. get rid
of making MatchData in middle, which is not used.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59496 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_initialize): new function to (re)initialize a
string with data and encoding. extracted from
rb_external_str_new_with_enc.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59448 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
These caused numerous CI failures I haven't been able to
reproduce [ruby-core:82102]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59364 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The same hash keys may be loaded from tainted data sources
frequently (e.g. parsing headers from socket or loading
YAML data from a file). If a non-tainted fstring already
exists (because the application expects the hash key),
cache and deduplicate the tainted version in the new
tainted_frozen_strings table.
For non-embedded strings, this also allows sharing with the
underlying malloc-ed data.
* vm_core.h (rb_vm_struct): add tainted_frozen_strings
* vm.c (ruby_vm_destruct): free tainted_frozen_strings
(Init_vm_objects): initialize tainted_frozen_strings
(rb_vm_tfstring_table): accessor for tainted_frozen_strings
* internal.h: declare rb_fstring_existing, rb_vm_tfstring_table
* hash.c (fstring_existing_str): remove (moved to string.c)
(hash_aset_str): use rb_fstring_existing
* string.c (rb_fstring_existing): new, based on fstring_existing_str
(tainted_fstr_update): new
(rb_fstring_existing0): new, based on fstring_existing_str
(rb_tainted_fstring_existing): new, special case for tainted strings
(rb_str_free): delete from tainted_frozen_strings table
* test/ruby/test_optimization.rb (test_hash_reuse_fstring): new test
[ruby-core:82012] [Bug #13737]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59354 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Fix a wrong jump so replacing a byte in an ASCII-only string with an
ASCII character won't clear the coderange.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59272 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
to remove leading substr [Feature #12694] [fix GH-1632]
* string.c (rb_str_delete_prefix_bang): add a new method
to remove prefix destuctively.
* string.c (rb_str_delete_prefix): add a new method
to remove prefix non-destuctively.
* test/ruby/test_string.rb: add tests.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59132 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_chomp_bang): check if modifiable after checking
an argument and just before modification, as it can get frozen
during the argument conversion to String.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59112 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] clarify docs for String#split when called
with limit and capture groups.
Reported by Cichol Tsai. [ruby-core:81505] [Bug #13621]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59002 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
To convert the object implicitly, it has had two parts in convert_type() which are
1. lookink up the method's id
2. calling the method
Seems that strncmp() and strcmp() in convert_type() are slightly heavy to look up
the method's id for type conversion.
This patch will add and use internal APIs (rb_convert_type_with_id, rb_check_convert_type_with_id)
to call the method without looking up the method's id when convert the object.
Array#flatten -> 19 % up
Array#+ -> 3 % up
[ruby-dev:50024] [Bug #13341] [Fix GH-1537]
### Before
Array#flatten 104.119k (± 1.1%) i/s - 525.690k in 5.049517s
Array#+ 1.993M (± 1.8%) i/s - 10.010M in 5.024258s
### After
Array#flatten 124.005k (± 1.0%) i/s - 624.240k in 5.034477s
Array#+ 2.058M (± 4.8%) i/s - 10.302M in 5.019328s
### Test Code
require 'benchmark/ips'
class Foo
def to_ary
[1,2,3]
end
end
Benchmark.ips do |x|
ary = []
100.times { |i| ary << i }
array = [ary]
x.report "Array#flatten" do |i|
i.times { array.flatten }
end
x.report "Array#+" do |i|
obj = Foo.new
i.times { array + obj }
end
end
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58978 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_insnhelper.c (rb_eql_opt): should call #eql? on Float and
String, not #==.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58882 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_crypt): struct crypt_data defined in
missing/crypt.h is small enough.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58866 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* debug_counter.h: add the following counters to measure object types.
obj_free: freed count
obj_str_ptr: freed count of Strings they have extra buff.
obj_str_embed: freed count of Strings they don't have extra buff.
obj_str_shared: freed count of Strings they have shared extra buff.
obj_str_nofree: freed count of Strings they are marked as nofree.
obj_str_fstr: freed count of Strings they are marked as fstr.
obj_ary_ptr: freed count of Arrays they have extra buff.
obj_ary_embed: freed count of Arrays they don't have extra buff.
obj_obj_ptr: freed count of Objects (T_OBJECT) they have extra buff.
obj_obj_embed: freed count of Objects they don't have extra buff.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58865 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
"struct crypt_data" is 131232 bytes on x86-64 GNU/Linux,
making it unsafe to use tiny Fiber stack sizes.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58864 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: make String#{casecmp,casecmp?} return nil for
non-string arguments instead of raising a TypeError.
* test/ruby/test_string.rb: add tests.
Reported by Marcus Stollsteimer. Based on a patch by Shingo Morita.
[ruby-core:80145] [Bug #13312]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58837 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_external_str_new_with_enc): cut down intermediate
string for conversion source, by appending with conversion.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58709 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_external_str_new_with_enc): fix the case of
conversion failure. when conversion failed for some reason,
just ignores the default internal encoding and returns in the
given encoding.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58705 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_external_str_new_with_enc): cut down intermediate
string for conversion source, by appending with conversion.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58703 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_unicode_normalize): remove bare Unicode. do
not assume that all compilers can handle UTF-8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58688 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] add example for String#match with pos argument.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58669 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] adopt call-seq's for Symbol#{match,match?} from
String methods; other small improvements for Symbol docs.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58668 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (unicode_normalize_common): aggregation type cannot be
initialized with dynamic values, in C89.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58621 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
In string.c, replace hand-written argument count check by call to rb_scan_args.
This allows to use rb_funcallv once, rather than using rb_funcall twice.
Thanks to Hanmac (Hans Mackowiak) for the idea, see
https://bugs.ruby-lang.org/issues/11078#note-7.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58618 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
In string.c, refactor the common parts (requiring of unicode_normalize/normalize.rb,
check of number of arguments) of the unicode normalization functions
(rb_str_unicode_normalize, rb_str_unicode_normalize_bang, rb_str_unicode_normalized_p)
into the new function unicode_normalize_common.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58558 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalized?
(including documentation). Leave a comment explaining that the file is now empty.
* string.c: Define String#unicode_normalized? in rb_str_unicode_normalized_p in C,
(including documentation)
* lib/unicode_normalize/normalize.rb: Remove (re)definition of
String#unicode_normalized? to avoid warnings (when $VERBOSE==true) and
problems when String is frozen
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58555 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalize!
(including documentation)
* string.c: Define String#unicode_normalize! in rb_str_unicode_normalize_bang in C,
(including documentation)
* lib/unicode_normalize/normalize.rb: Remove (re)definition of
String#unicode_normalize! to avoid warnings (when $VERBOSE==true) and
problems when String is frozen
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58553 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalize
(including documentation)
* string.c: Define String#unicode_normalize in rb_str_unicode_normalize in C,
(including documentation)
* lib/unicode_normalize/normalize.rb: Remove (re)definition of
String#unicode_normalize to avoid warnings (when $VERBOSE==true) and
problems when String is frozen
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58550 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_change_terminator_length): when called after
the content has been copied, old terminator length no longer
makes sense. use the whole usable size instead of capacity
without terminator. [ruby-core:80257] [Bug #13339]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58042 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Make sure it's clear that the returned values are not Unicode codepoints
for encodings other than UTF-8/UTF-16(BE|LE)/UTF-32(BE|LE).
[ci skip] [Bug #13321]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58000 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Anybody who hits these code paths can hit them again in the
future, so try deduplicating across multiple runs of these
methods to reduce garbage.
* string.c (str_upto_each): fstring on "%.*d"
* strftime.c (rb_strftime_with_timespec): fstring on "%0*d"
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57997 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (str_casecmp, str_casecmp_p): split to skip argument
check when it is a String certainly.
* string.c (sym_casecmp, sym_casecmp_p): shortcut argument checks.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57978 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_cmp_m): use rb_check_string_type for check and
conversion, instead of calling the conversion method directly.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57965 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] improve docs of Symbol#casecmp and Symbol#casecmp?
according to the similar String methods; fix RDoc markup and typos;
fix call-seq's for Symbol#{upcase,downcase,capitalize,swapcase}.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57963 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (Init_String): $; must be a GC-root, not to be
collected. [ruby-core:79582]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57958 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] specify when String#casecmp and String#casecmp?
return nil; modify examples to better show difference to <=>;
fix RDoc markup and typos.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57886 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_update): do not use negation of LONG_MIN, which
is negative too.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57800 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (str_byte_substr): fix another integer overflow which
can happen only when SHARABLE_MIDDLE_SUBSTRING is enabled.
[ruby-core:79951] [Bug #13289]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57799 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_subpos): fix integer overflow which can happen
only when SHARABLE_MIDDLE_SUBSTRING is enabled. incorpolate
https://github.com/mruby/mruby/commit/7db0786abdd243ba031e24683f
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57797 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: [DOC] restore documentation for String#<<
which became undocumented with r56021; fix a typo.
[ruby-core:79865] [Bug #13268]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57758 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This exposes the rb_fstring internal function to return a
deduped and frozen string when a non-frozen string is given.
This is useful for writing all sorts of record processing key
values maybe stored, but certain keys and values are often
duplicated at a high frequency, so memory savings can
noticeable.
Use cases are many:
* email/NNTP header processing
There are some standard header keys everybody uses
(From/To/Cc/Date/Subject/Received/Message-ID/References/In-Reply-To),
as well as common ones specific to a certain lists:
(ruby-core has X-Redmine-* headers)
It is also useful to dedupe values, as most inboxes have
multiple messages from the same sender, or MUA.
* package management systems -
things like RubyGems stores identical strings for licenses,
dependency names, author names/emails, etc
* HTTP headers/trailers -
standard headers (Host/Accept/Accept-Encoding/User-Agent/...)
are common, but there are also uncommon ones.
Values may be deduped, as well, as it is likely a user
agent will make multiple/parallel requests to the same
server.
* version control systems -
this can be useful for deduplicating names of frequent
committers (like "nobu" :)
In linux.git and git.git, there are also common
trailers such as Signed-Off-By/Acked-by/Reviewed-by/Fixes/...
as well as less common ones.
* audio metadata -
There are commonly used tags (Artist/Album/Title/Tracknumber),
but Vorbis comments allows arbitrary key values to be stored.
Music collections contain songs by the same artist or mutiple
songs from the same album, so deduplicating values will be
helpful there, too.
* JSON, YAML, XML, HTML processing
Certain fields, tags and attributes are commonly used
across the same and multiple documents
There is no security concern in this being a DoS vector by
causing immortal strings. The fstring table is not a GC-root
and not walked during the mark phase. GC-able dynamic symbols
since Ruby 2.2 are handled in the same manner, and that
implementation also relies on the non-immortality of fstrings.
[Feature #13077] [ruby-core:79663]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57698 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (str_shared_replace): use RUBY_ASSERT for
pre-condition.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57628 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_enumerate_lines): hint to suppress a
maybe-uninitialized warning by gcc.
* thread.c (rb_fd_no_init): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57618 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: add example for Symbol#to_s.
The docs for Symbol#to_s only include an example for
Symbol#id2name, but not for #to_s which is an alias;
the docs should include examples for both methods.
From: Marcus Stollsteimer <sto.mar@web.de>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57536 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Since the fstring table encompasses all strings in the
symbol table, we may reuse the fstring table walk to set
the class and eliminate the branch in rb_id2str.
* string.c (Init_String): use rb_cString immediately after definition
* symbol.c (rb_id2str): eliminate branch to set class
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57521 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Handle the embedded case first, since we may have an embedded
duplicate and non-embedded original string.
* string.c (rb_str_tmp_frozen_release): handled embedded strings
* test/ruby/test_io.rb (test_write_no_garbage): new test
[ruby-core:78898] [Bug #13085]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57471 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (STR_IS_SHARED_M): new flag to mark shared mulitple times
(STR_SET_SHARED): set STR_IS_SHARED_M
(rb_str_tmp_frozen_acquire, rb_str_tmp_frozen_release): new functions
(str_new_frozen): set/unset STR_IS_SHARED_M as appropriate
* internal.h: declare new functions
* io.c (fwrite_arg, fwrite_do, fwrite_end): new
(io_fwrite): use new functions
Introduce rb_str_tmp_frozen_acquire and rb_str_tmp_frozen_release
to manage a hidden, frozen string. Reuse one bit of the embed
length for shared strings as STR_IS_SHARED_M to indicate a string
has been shared multiple times. In the common case, the string
is only shared once so the object slot can be reclaimed immediately.
minimum results in each 3 measurements. (time and size)
Execution time (sec)
name trunk built
io_copy_stream_write 0.682 0.254
io_copy_stream_write_socket 1.225 0.751
Speedup ratio: compare with the result of `trunk' (greater is better)
name built
io_copy_stream_write 2.680
io_copy_stream_write_socket 1.630
Memory usage (last size) (B)
name trunk built
io_copy_stream_write 95436800.000 6512640.000
io_copy_stream_write_socket 117628928.000 7127040.000
Memory consuming ratio (size) with the result of `trunk' (greater is better)
name built
io_copy_stream_write 14.654
io_copy_stream_write_socket 16.505
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57469 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This seems a bug introduced by r520 (1.4.0). [ruby-core:79110] [Bug #13135]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57374 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* file.c (rb_get_path_check_convert): refine the error message
when the path name contains null byte.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57336 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_enc_str_scrub): only one of replacement and block
is allowed. [ruby-core:79038] [Bug #13119]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57304 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_enc_str_scrub): yield the invalid part only with
ASCII-incompatible. [ruby-core:79039] [Bug #13120]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57303 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_enumerate_lines): in paragraph mode, do not
include newlines which separate paragraphs, so that it will be
consistent with IO#each_line.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57184 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_casecmp_p): [DOC] use Unicode escape form to
get rid of warning C4819 by Microsoft Visual C++.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57154 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (crypt.h): crypt_r() was added in FreeBSD 12.0 but is
declared in unistd.h. [ruby-core:78664] [Bug #13038]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (chomp_newline): fix chomping newline only line.
rb_enc_prev_char return NULL if no previous character and must
not call rb_enc_ascget on it. a patch by Ary Borenszweig
<asterite AT gmail.com> at [ruby-core:78666]. [Bug #13037]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_equal): [DOC] fix fallback method name. the
peer's == method will be used, not ===.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57056 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_match_m_p): inverse of Regexp#match?. based on
the patch by Herwin Weststrate <herwin@snt.utwente.nl>.
[Fix GH-1483] [Feature #12898]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57053 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Fix documentation of String#casecmp? (examples didn't have the '?').
Add an example with non-ASCII characters. Clarify that casecmp,
unlike casecmp?, only does case-insensitivity on A-Z/a-z.
[ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56926 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (mapping_buffer): get rid of zero-length array member,
which is not a part of C90.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56915 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: Implement String#casecmp? and Symbol#casecmp? by using
String#downcase :fold for Unicode case folding. This does not include
options such as :turkic, because these currently cannot be combined
with the :fold option. This implements feature #12786.
* test/ruby/test_string.rb/test_symbol.rb: Tests for above.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56912 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
a hash value of Object might be Bignum, but it causes many troubles
expecially the Object is used as a key of a hash. so I've gave up
to do so.
* array.c (rb_ary_hash): use above macro.
* bignum.c (rb_big_hash): ditto.
* hash.c (rb_obj_hash, rb_hash_hash): ditto.
* numeric.c (rb_dbl_hash): ditto.
* proc.c (proc_hash): ditto.
* re.c (rb_reg_hash, match_hash): ditto.
* string.c (rb_str_hash_m): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56340 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
size with int, and of course also not guaranteed the value can be
Fixnum.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56320 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (lstrip_offset): add a fast path in the case of single
byte optimizable strings, as well as rstrip_offset.
[ruby-core:77392] [Feature #12788]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56250 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (enc_strlen, rb_enc_strlen_cr): Avoid signed integer
overflow. The result type of a pointer subtraction may have the same
size as long. This fixes String#size returning an negative value on
i686-linux environment:
str = "\x00" * ((1<<31)-2))
str.slice!(-3, 3)
str.force_encoding("UTF-32BE")
str << 1234
p str.size
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56247 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
actually check its availability rather to check GCC's version.
* configure.in (WARN_UNUSED_RESULT): moved to here.
* configure.in (RUBY_FUNC_ATTRIBUTE): change function declaration
to return int rather than void, because it makes no sense for a
warn_unused_result attributed function to return void.
Funny thing however is that it also makes no sense for noreturn
attributed function to return int. So there is a fundamental
conflict between them. While I tested this, I confirmed both
GCC 6 and Clang 3.8 prefers int over void to correctly detect
necessary attributes under this setup. Maybe subject to change
in future.
* internal.h (UNINITIALIZED_VAR): renamed to MAYBE_UNUSED, then
moved to configure.in for the same reason we move
WARN_UNUSED_RESULT.
* configure.in (MAYBE_UNUSED): moved to here.
* internal.h (__has_attribute): deleted, because it has no use now.
* string.c (rb_str_enumerate_lines): refactor macro rename.
* string.c (rb_str_enumerate_bytes): ditto.
* string.c (rb_str_enumerate_chars): ditto.
* string.c (rb_str_enumerate_codepoints): ditto.
* thread.c (do_select): ditto.
* vm_backtrace.c (rb_debug_inspector_open): ditto.
* vsnprintf.c (BSD_vfprintf): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56169 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The behavior on signed integer overflow is undefined. On platform with
sizeof(long)==4, it's fairly easy that 'len + termlen' overflows, where
len is the string length and termlen is the terminator length.
So, prevent the integer overflow by avoiding adding to a string length,
or casting to size_t before adding where the total size is passed to
{RE,}ALLOC*().
* string.c (STR_HEAP_SIZE, RESIZE_CAPA_TERM, str_new0, rb_str_buf_new,
str_shared_replace, rb_str_init, str_make_independent_expand,
rb_str_resize): Avoid overflow by casting the length to size_t. size_t
should be able to represent LONG_MAX+termlen.
* string.c (rb_str_modify_expand): Check that the new length is in the
range of long before resizing. Also refactor to use RESIZE_CAPA_TERM
macro.
* string.c (str_buf_cat): Fix so that it does not create a negative
length String. Also fix the condition for 'string sizes too big', the
total length can be up to LONG_MAX.
* string.c (rb_str_plus): Check the resulting String length does not
exceed LONG_MAX.
* string.c (rb_str_dump): Fix integer overflow. The dump result will be
longer then the original String.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56157 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (STR_EMBEDDABLE_P): Renamed from STR_EMBEDABLE_P(). And use
it in more places.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56155 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (STR_EMBEDABLE_P): extract the predicate macro to tell
if the given length is capable in an embedded string, and fix
possible integer overflow.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56151 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_change_terminator_length): fix integer overflow
in the case growing the terminator length and the string length
is around LONG_MAX.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56149 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_set_len): The buffer overflow check is wrong. The
space for termlen is allocated outside the capacity returned by
rb_str_capacity(). This fixes r41920 ("string.c: multi-byte
terminator", 2013-07-11). [ruby-core:77257] [Bug #12757]
* test/-ext-/string/test_set_len.rb (test_capacity_equals_to_new_size):
Test for this change. Applying only the test will trigger [BUG].
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56148 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* array.c (rb_ary_concat_multi): take multiple arguments. based
on the patch by Satoru Horie. [Feature #12333]
* string.c (rb_str_concat_multi, rb_str_prepend_multi): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56021 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_fs_setter): check and convert $; value at
assignment.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55990 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_split_m): show $; name in error message when it
is a wrong object.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55986 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* *.c: rename rb_funcall2 to rb_funcallv, except for extensions
which are/will be/may be gems. [Fix GH-1406]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55773 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
[Bug #12628]
This patch introduce many changes.
* Introduce concept of "Block Handler (BH)" to represent
passed blocks.
* move rb_control_frame_t::flag to ep[0] (as a special local
variable). This flags represents not only frame type, but also
env flags such as escaped.
* rename `rb_block_t` to `struct rb_block`.
* Make Proc, Binding and RubyVM::Env objects wb-protected.
Check [Bug #12628] for more details.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55766 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
UTF-8 to use upper-case four-digit hexadecimal escapes without braces
where possible [Feature #12419].
* test/ruby/test_string.rb (test_dump): Add tests for above.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55728 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (STR_BUF_MIN_SIZE): reduce from 128 to 127
[ruby-core:76371] [Feature #12025]
* string.c (rb_str_buf_new): adjust for above reduction
From Jeremy Evans <code@jeremyevans.net>:
This changes the minimum buffer size for string buffers from 128 to
127. The underlying C buffer is always 1 more than the ruby buffer,
so this changes the actual amount of memory used for the minimum
string buffer from 129 to 128. This makes it much easier on the
malloc implementation, as evidenced by the following code (note that
time -l is used here, but Linux systems may need time -v).
$ cat bench_mem.rb
i = ARGV.first.to_i
Array.new(1000000){" " * i}
$ /usr/bin/time -l ruby bench_mem.rb 128
3.10 real 2.19 user 0.46 sys
289080 maximum resident set size
72673 minor page faults
13 block output operations
29 voluntary context switches
$ /usr/bin/time -l ruby bench_mem.rb 127
2.64 real 2.09 user 0.27 sys
162720 maximum resident set size
40966 minor page faults
2 block output operations
4 voluntary context switches
To try to ensure a power-of-2 growth, when a ruby string capacity
needs to be increased, after doubling the capacity, add one. This
ensures the ruby capacity will be odd, which means actual amount
of memory used will be even, which is probably better than the
current case of the ruby capacity being even and the actual amount
of memory used being odd.
A very similar patch was proposed 4 years ago in feature #5875. It
ended up being rejected, because no performance increase was shown.
One reason for that is that ruby does not use STR_BUF_MIN_SIZE
unless rb_str_buf_new is called, and that previously did not have
a ruby API, only a C API, so unless you were using a C extension
that called it, there would be no performance increase.
With the recently proposed feature #12024, String.buffer is added,
which is a ruby API for creating string buffers. Using
String.buffer(100) wastes much less memory with this patch, as the
malloc implementation can more easily deal with the power-of-2
sized memory usage. As measured above, memory usage is 44% less,
and performance is 17% better.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55686 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
termlen and resize heap for the terminator. This is split from
rb_str_fill_terminator (str_fill_term) because filling terminator
and changing terminator length are different things. [Bug #12536]
* internal.h: declaration for rb_str_change_terminator_length.
* string.c (str_fill_term): Simplify only to zero-fill the terminator.
For non-shared strings, it assumes that (capa + termlen) bytes of
heap is allocated. This partially reverts r55557.
* encoding.c (rb_enc_associate_index): rb_str_change_terminator_length
is used, and it should be called whenever the termlen is changed.
* string.c (str_capacity): New static function to return capacity
of a string with the given termlen, because the termlen may
sometimes be different from TERM_LEN(str) especially during
changing termlen or filling terminator with specific termlen.
* string.c (rb_str_capacity): Use str_capacity.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
ChangeLog about the reverted changes are also deleted in this file.
[Bug #12536] [ruby-dev:49699] [ruby-dev:49702]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55559 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
of memory for termlen should always be needed.
In this fix, if possible, decrease capa instead of realloc.
[Bug #12536] [ruby-dev:49699]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55557 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Additional fix for [Bug #12536] [ruby-dev:49699].
* string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen
which is apparently 1 for the encodings.
* string.c (str_new0_cstr): New static function to create a String
object from a C string with specifying termlen.
* string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify
termlen by using new str_new0_cstr().
* string.c (str_new_static): Specify termlen from the given encoding
when creating a new String object is needed.
* string.c (rb_tainted_str_new_with_enc): New function to create a
tainted String object with the given encoding. This means that
the termlen is correctly specified. Curretly static function.
The function name might be renamed to rb_tainted_enc_str_new
or rb_enc_tainted_str_new.
* string.c (rb_external_str_new_with_enc): Use encoding by using the
above rb_tainted_str_new_with_enc().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55555 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
is used, TERM_LEN(str) should be considered with it because
embedded strings are also processed by TERM_FILL.
Additional fix for [Bug #12536] [ruby-dev:49699].
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55552 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
[Bug #12536] [ruby-dev:49699]
* string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
the same as largest value of rb_enc_mbminlen(enc) among encodings.
* string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
+TERM_LEN_MAX bytes instead of +1. This change may increase memory
usage.
* string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".
* string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
of termlen.
* string.c (str_shared_replace): Copy +termlen bytes instead of +1.
* string.c (rb_str_times): termlen should not be included in capa.
* string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
termlen should be counted with it because embedded strings are
also processed by TERM_FILL.
* string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.
* string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55547 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_casemap): do not put code with side effects
inside RSTRING_PTR() macro which evaluates the argument multiple
times.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55481 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
:ascii option in rb_str_upcase_bang and rb_str_downcase_bang.
* regenc.c: Fix a bug (wrong use of unnecessary slack at end of string).
* regenc.h -> include/ruby/oniguruma.h: Move declaration of
onigenc_ascii_only_case_map so that it is visible in string.c.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55329 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
for Unicode case mapping.
* test/ruby/enc/test_case_comprehensive.rb: Tests for above
functionality; fixed an encoding issue in assertion error message.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55296 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* missing/crypt.h (struct crypt_data): remove unnecessary member
"initialized".
* missing/crypt.c (des_setkey_r): nothing to be initialized in
crypt_data.
* configure.in (struct crypt_data): check for "initialized" in
struct crypt_data, which may be only in glibc, and isn't on AIX
at least.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55272 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
case mapping methods.
* enc/unicode.c: Check for invalid string and signal with negative
length value.
* test/ruby/enc/test_case_mapping.rb: Add tests for above.
* test/ruby/test_m17n_comb.rb: Add a message to clarify test failure.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55253 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: prefer crypt_r to crypt iff system crypt nor crypt_r
are not provided.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55250 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* configure.in: revert r55237. replace crypt, not crypt_r, and
check if crypt is broken more.
* missing/crypt.c: move crypt_r.c
* string.c (rb_str_crypt): use crypt_r if provided by the system.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55245 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
the protective check for the presence of an option.
Update documentation.
* test/ruby/enc/test_case_comprehensive.rb: Adjust tests for above change.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55225 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transcode.c (str_transcode0): scrub in the given encoding when
the source encoding is given, not in the encoding of the
receiver. [ruby-core:75732] [Bug #12431]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55181 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* include/ruby/ruby.h (RB_INTEGER_TYPE_P): new macro and
underlying inline function to check if the object is an
Integer (Fixnum or Bignum).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
and define CONSTFUNC and PUREFUNC if available.
Note that I don't add those options as default because
it still shows many false-positive (it seems not to consider
longjmp).
* vm_eval.c (stack_check): get rb_thread_t* as an argument
to avoid duplicate call of GET_THREAD().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54952 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
only if it can use SSE 4.2 POPCNT whose latency is 3 cycle.
* internal.h (rb_popcount64): use __builtin_popcountll because now
it is in fast path.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54894 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_concat): shortcut concatenation to ASCII-8BIT
as well as US-ASCII.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54882 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_concat): [DOC] fix the indefinite article, for
replacement from Fixnum to Integer.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54881 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* configure.in (__builtin_ctz): check.
* configure.in (__builtin_ctzll): check.
* internal.h (rb_popcount32): defined for ntz_int32.
it can use __builtin_popcount but this function is not used on
GCC environment because it uses __builtin_ctz.
When another function uses this, using __builtin_popcount
should be re-considered.
* internal.h (rb_popcount64): ditto.
* internal.h (ntz_int32): defined for ntz_intptr.
* internal.h (ntz_int64): defined for ntz_intptr.
* internal.h (ntz_intptr): defined as ntz for uintptr_t.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54854 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (enc_succ_alnum_char): try to skip an invalid character
gap between GREEK CAPITAL RHO and SIGMA.
[ruby-core:74478] [Bug #12204]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54210 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (sym_match_m): delegate to String#match but not
String#=~. [ruby-core:72864] [Bug #11991]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53866 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_dump): share same string literal instead of a
magic number.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53774 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_external_str_with_enc, rb_str_concat, rb_str_dump):
use encoding index as shortcut without rb_encoding.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53773 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_fstring_enc_new, rb_fstring_enc_cstr): functions to
make fstring with encoding.
* re.c (rb_reg_initialize): make fstring without copying.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53736 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* error.c (rb_assert_failure): assertion with stack dump.
* ruby_assert.h (RUBY_ASSERT): new header for the assertion.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53615 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
upcase/downcase/capitalize/swapcase. :lithuanian can be used for
testing if no specific option is desired.
* test/ruby/enc/test_case_mapping.rb: Adjusted to above.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53565 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
added hand-coded support for Turkic, fixed logic for swapcase.
* string.c: Made use of new case mapping code possible from upcase,
capitalize, and swapcase (with :lithuanian as a guard).
* test/ruby/enc/test_case_mapping.rb: Adjusted for above.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53562 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
case mapping. The code path is currently guarded by the :lithuanian
option to avoid accidental problems in daily use.
* test/ruby/enc/test_case_mapping.rb: Test for above.
* string.c: function 'check_case_options': fixed logical errors
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53548 b2dd03c8-39d4-4d8f-98ff-823fe69b080e