* string.c (STR_BUF_MIN_SIZE): reduce from 128 to 127
[ruby-core:76371] [Feature #12025]
* string.c (rb_str_buf_new): adjust for above reduction
From Jeremy Evans <code@jeremyevans.net>:
This changes the minimum buffer size for string buffers from 128 to
127. The underlying C buffer is always 1 more than the ruby buffer,
so this changes the actual amount of memory used for the minimum
string buffer from 129 to 128. This makes it much easier on the
malloc implementation, as evidenced by the following code (note that
time -l is used here, but Linux systems may need time -v).
$ cat bench_mem.rb
i = ARGV.first.to_i
Array.new(1000000){" " * i}
$ /usr/bin/time -l ruby bench_mem.rb 128
3.10 real 2.19 user 0.46 sys
289080 maximum resident set size
72673 minor page faults
13 block output operations
29 voluntary context switches
$ /usr/bin/time -l ruby bench_mem.rb 127
2.64 real 2.09 user 0.27 sys
162720 maximum resident set size
40966 minor page faults
2 block output operations
4 voluntary context switches
To try to ensure a power-of-2 growth, when a ruby string capacity
needs to be increased, after doubling the capacity, add one. This
ensures the ruby capacity will be odd, which means actual amount
of memory used will be even, which is probably better than the
current case of the ruby capacity being even and the actual amount
of memory used being odd.
A very similar patch was proposed 4 years ago in feature #5875. It
ended up being rejected, because no performance increase was shown.
One reason for that is that ruby does not use STR_BUF_MIN_SIZE
unless rb_str_buf_new is called, and that previously did not have
a ruby API, only a C API, so unless you were using a C extension
that called it, there would be no performance increase.
With the recently proposed feature #12024, String.buffer is added,
which is a ruby API for creating string buffers. Using
String.buffer(100) wastes much less memory with this patch, as the
malloc implementation can more easily deal with the power-of-2
sized memory usage. As measured above, memory usage is 44% less,
and performance is 17% better.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55686 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
termlen and resize heap for the terminator. This is split from
rb_str_fill_terminator (str_fill_term) because filling terminator
and changing terminator length are different things. [Bug #12536]
* internal.h: declaration for rb_str_change_terminator_length.
* string.c (str_fill_term): Simplify only to zero-fill the terminator.
For non-shared strings, it assumes that (capa + termlen) bytes of
heap is allocated. This partially reverts r55557.
* encoding.c (rb_enc_associate_index): rb_str_change_terminator_length
is used, and it should be called whenever the termlen is changed.
* string.c (str_capacity): New static function to return capacity
of a string with the given termlen, because the termlen may
sometimes be different from TERM_LEN(str) especially during
changing termlen or filling terminator with specific termlen.
* string.c (rb_str_capacity): Use str_capacity.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
ChangeLog about the reverted changes are also deleted in this file.
[Bug #12536] [ruby-dev:49699] [ruby-dev:49702]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55559 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
of memory for termlen should always be needed.
In this fix, if possible, decrease capa instead of realloc.
[Bug #12536] [ruby-dev:49699]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55557 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Additional fix for [Bug #12536] [ruby-dev:49699].
* string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen
which is apparently 1 for the encodings.
* string.c (str_new0_cstr): New static function to create a String
object from a C string with specifying termlen.
* string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify
termlen by using new str_new0_cstr().
* string.c (str_new_static): Specify termlen from the given encoding
when creating a new String object is needed.
* string.c (rb_tainted_str_new_with_enc): New function to create a
tainted String object with the given encoding. This means that
the termlen is correctly specified. Curretly static function.
The function name might be renamed to rb_tainted_enc_str_new
or rb_enc_tainted_str_new.
* string.c (rb_external_str_new_with_enc): Use encoding by using the
above rb_tainted_str_new_with_enc().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55555 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
is used, TERM_LEN(str) should be considered with it because
embedded strings are also processed by TERM_FILL.
Additional fix for [Bug #12536] [ruby-dev:49699].
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55552 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
[Bug #12536] [ruby-dev:49699]
* string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
the same as largest value of rb_enc_mbminlen(enc) among encodings.
* string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
+TERM_LEN_MAX bytes instead of +1. This change may increase memory
usage.
* string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".
* string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
of termlen.
* string.c (str_shared_replace): Copy +termlen bytes instead of +1.
* string.c (rb_str_times): termlen should not be included in capa.
* string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
termlen should be counted with it because embedded strings are
also processed by TERM_FILL.
* string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.
* string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55547 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_casemap): do not put code with side effects
inside RSTRING_PTR() macro which evaluates the argument multiple
times.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55481 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
:ascii option in rb_str_upcase_bang and rb_str_downcase_bang.
* regenc.c: Fix a bug (wrong use of unnecessary slack at end of string).
* regenc.h -> include/ruby/oniguruma.h: Move declaration of
onigenc_ascii_only_case_map so that it is visible in string.c.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55329 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
for Unicode case mapping.
* test/ruby/enc/test_case_comprehensive.rb: Tests for above
functionality; fixed an encoding issue in assertion error message.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55296 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* missing/crypt.h (struct crypt_data): remove unnecessary member
"initialized".
* missing/crypt.c (des_setkey_r): nothing to be initialized in
crypt_data.
* configure.in (struct crypt_data): check for "initialized" in
struct crypt_data, which may be only in glibc, and isn't on AIX
at least.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55272 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
case mapping methods.
* enc/unicode.c: Check for invalid string and signal with negative
length value.
* test/ruby/enc/test_case_mapping.rb: Add tests for above.
* test/ruby/test_m17n_comb.rb: Add a message to clarify test failure.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55253 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c: prefer crypt_r to crypt iff system crypt nor crypt_r
are not provided.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55250 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* configure.in: revert r55237. replace crypt, not crypt_r, and
check if crypt is broken more.
* missing/crypt.c: move crypt_r.c
* string.c (rb_str_crypt): use crypt_r if provided by the system.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55245 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
the protective check for the presence of an option.
Update documentation.
* test/ruby/enc/test_case_comprehensive.rb: Adjust tests for above change.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55225 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transcode.c (str_transcode0): scrub in the given encoding when
the source encoding is given, not in the encoding of the
receiver. [ruby-core:75732] [Bug #12431]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55181 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* include/ruby/ruby.h (RB_INTEGER_TYPE_P): new macro and
underlying inline function to check if the object is an
Integer (Fixnum or Bignum).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
and define CONSTFUNC and PUREFUNC if available.
Note that I don't add those options as default because
it still shows many false-positive (it seems not to consider
longjmp).
* vm_eval.c (stack_check): get rb_thread_t* as an argument
to avoid duplicate call of GET_THREAD().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54952 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
only if it can use SSE 4.2 POPCNT whose latency is 3 cycle.
* internal.h (rb_popcount64): use __builtin_popcountll because now
it is in fast path.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54894 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_concat): shortcut concatenation to ASCII-8BIT
as well as US-ASCII.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54882 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_concat): [DOC] fix the indefinite article, for
replacement from Fixnum to Integer.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54881 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* configure.in (__builtin_ctz): check.
* configure.in (__builtin_ctzll): check.
* internal.h (rb_popcount32): defined for ntz_int32.
it can use __builtin_popcount but this function is not used on
GCC environment because it uses __builtin_ctz.
When another function uses this, using __builtin_popcount
should be re-considered.
* internal.h (rb_popcount64): ditto.
* internal.h (ntz_int32): defined for ntz_intptr.
* internal.h (ntz_int64): defined for ntz_intptr.
* internal.h (ntz_intptr): defined as ntz for uintptr_t.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54854 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (enc_succ_alnum_char): try to skip an invalid character
gap between GREEK CAPITAL RHO and SIGMA.
[ruby-core:74478] [Bug #12204]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54210 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (sym_match_m): delegate to String#match but not
String#=~. [ruby-core:72864] [Bug #11991]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53866 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_dump): share same string literal instead of a
magic number.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53774 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_external_str_with_enc, rb_str_concat, rb_str_dump):
use encoding index as shortcut without rb_encoding.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53773 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_fstring_enc_new, rb_fstring_enc_cstr): functions to
make fstring with encoding.
* re.c (rb_reg_initialize): make fstring without copying.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53736 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* error.c (rb_assert_failure): assertion with stack dump.
* ruby_assert.h (RUBY_ASSERT): new header for the assertion.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53615 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
upcase/downcase/capitalize/swapcase. :lithuanian can be used for
testing if no specific option is desired.
* test/ruby/enc/test_case_mapping.rb: Adjusted to above.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53565 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
added hand-coded support for Turkic, fixed logic for swapcase.
* string.c: Made use of new case mapping code possible from upcase,
capitalize, and swapcase (with :lithuanian as a guard).
* test/ruby/enc/test_case_mapping.rb: Adjusted for above.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53562 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
case mapping. The code path is currently guarded by the :lithuanian
option to avoid accidental problems in daily use.
* test/ruby/enc/test_case_mapping.rb: Test for above.
* string.c: function 'check_case_options': fixed logical errors
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53548 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
argument of `String#split` is not `$;` but `nil`.
When `nil` is passed as first argument, `$;` is used.
[ci skip] [Bug #11729] [ruby-dev:49378]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53260 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (str_compat_and_valid): as scrub does nothing for dummy
encoding string now, incompatible encoding is not a matter.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53235 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_scrub): the result should be infected by the
original string.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53169 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
instead of raising Encoding::CompatibilityError. [Feature #11801]
* string.c (rb_str_escape): added to dump given string like
rb_str_inspect without quotes and always dump in US-ASCII
like rb_str_dump.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53027 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_init): rb_id_encoding() returns same ID with
caching.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52982 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_obj_as_string): fstring should not be infected.
re-apply r52872 and fix a typo.
TODO: other frozen strings also may not be.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52882 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_obj_as_string): fstring should not be infected.
TODO: other frozen strings also may not be.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52872 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (str_make_independent_expand): adjust argument
qualifier to get rid of a VC bug.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52844 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_string_value_cstr): should not raise on frozen
string.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52833 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (id_to_s): remove redundant variable
(rb_obj_as_string): trade id_to_s for idTo_s
(rb_str_equal): replace rb_intern(...) with pre-defined ID
(rb_str_cmp_m): ditto
(rb_str_match): ditto
(str_upto_each): ditto
(rb_str_sum): ditto
(Init_String): remove id_to_s initialization
This leads to a minor size reduction on my x86 (32-bit) system:
text data bss dec hex filename
129373 8 32 129413 1f985 string.o-orig
129082 8 8 129098 1f84a string.o
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52479 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This function only accept T_STRING (and T_REGEXP).
This patch improves performance of a tiny_segmenter benchmark
(num=2) 2.54sec -> 2.42sec on my machine.
https://github.com/chezou/TinySegmenter.jl/blob/master/benchmark/benchmark.rb
* encoding.c: add ENC_DEBUG and ENC_ASSERT() macros.
* internal.h: add a decl. of rb_enc_check_str().
* string.c (rb_str_plus): use rb_enc_check_str().
* string.c (rb_str_subpat_set): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52350 b2dd03c8-39d4-4d8f-98ff-823fe69b080e