Граф коммитов

1316 Коммитов

Автор SHA1 Сообщение Дата
ngoto 2bb292fccf * string.c (str_buf_cat): Fix capa size for embed string.
Fix bug in r55547. [Bug #12536]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55691 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-15 12:35:52 +00:00
normal ed5401a696 string.c: reduce malloc overhead for default buffer size
* string.c (STR_BUF_MIN_SIZE): reduce from 128 to 127
  [ruby-core:76371] [Feature #12025]
* string.c (rb_str_buf_new): adjust for above reduction

From Jeremy Evans <code@jeremyevans.net>:

This changes the minimum buffer size for string buffers from 128 to
127.  The underlying C buffer is always 1 more than the ruby buffer,
so this changes the actual amount of memory used for the minimum
string buffer from 129 to 128.  This makes it much easier on the
malloc implementation, as evidenced by the following code (note that
time -l is used here, but Linux systems may need time -v).

$ cat bench_mem.rb
i = ARGV.first.to_i
Array.new(1000000){" " * i}
$ /usr/bin/time -l ruby bench_mem.rb 128
        3.10 real         2.19 user         0.46 sys
    289080  maximum resident set size
     72673  minor page faults
        13  block output operations
        29  voluntary context switches
$ /usr/bin/time -l ruby bench_mem.rb 127
        2.64 real         2.09 user         0.27 sys
    162720  maximum resident set size
     40966  minor page faults
         2  block output operations
         4  voluntary context switches

To try to ensure a power-of-2 growth, when a ruby string capacity
needs to be increased, after doubling the capacity, add one.  This
ensures the ruby capacity will be odd, which means actual amount
of memory used will be even, which is probably better than the
current case of the ruby capacity being even and the actual amount
of memory used being odd.

A very similar patch was proposed 4 years ago in feature #5875. It
ended up being rejected, because no performance increase was shown.
One reason for that is that ruby does not use STR_BUF_MIN_SIZE
unless rb_str_buf_new is called, and that previously did not have
a ruby API, only a C API, so unless you were using a C extension
that called it, there would be no performance increase.

With the recently proposed feature #12024, String.buffer is added,
which is a ruby API for creating string buffers.  Using
String.buffer(100) wastes much less memory with this patch, as the
malloc implementation can more easily deal with the power-of-2
sized memory usage.  As measured above, memory usage is 44% less,
and performance is 17% better.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55686 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-14 23:30:29 +00:00
ngoto 5eff15d1bd * string.c (rb_str_change_terminator_length): New function to change
termlen and resize heap for the terminator. This is split from
  rb_str_fill_terminator (str_fill_term) because filling terminator
  and changing terminator length are different things. [Bug #12536]

* internal.h: declaration for rb_str_change_terminator_length.

* string.c (str_fill_term): Simplify only to zero-fill the terminator.
  For non-shared strings, it assumes that (capa + termlen) bytes of
  heap is allocated. This partially reverts r55557.

* encoding.c (rb_enc_associate_index): rb_str_change_terminator_length
  is used, and it should be called whenever the termlen is changed.

* string.c (str_capacity): New static function to return capacity
  of a string with the given termlen, because the termlen may
  sometimes be different from TERM_LEN(str) especially during
  changing termlen or filling terminator with specific termlen.

* string.c (rb_str_capacity): Use str_capacity.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-05 10:45:23 +00:00
ngoto 3418a277d8 * string.c: Partially reverts r55547 and r55555.
ChangeLog about the reverted changes are also deleted in this file.
  [Bug #12536] [ruby-dev:49699] [ruby-dev:49702]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55559 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01 18:11:11 +00:00
ngoto 61f2ee0d90 * string.c (str_fill_term): When termlen increases, re-allocation
of memory for termlen should always be needed.
  In this fix, if possible, decrease capa instead of realloc.
  [Bug #12536] [ruby-dev:49699]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55557 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01 17:32:21 +00:00
ngoto a92a537bf4 * string.c: Specify termlen as far as possible.
Additional fix for [Bug #12536] [ruby-dev:49699].

* string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen
  which is apparently 1 for the encodings.

* string.c (str_new0_cstr): New static function to create a String
  object from a C string with specifying termlen.

* string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify
  termlen by using new str_new0_cstr().

* string.c (str_new_static): Specify termlen from the given encoding
  when creating a new String object is needed.

* string.c (rb_tainted_str_new_with_enc): New function to create a
  tainted String object with the given encoding. This means that
  the termlen is correctly specified. Curretly static function.
  The function name might be renamed to rb_tainted_enc_str_new
  or rb_enc_tainted_str_new.

* string.c (rb_external_str_new_with_enc): Use encoding by using the
  above rb_tainted_str_new_with_enc().


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55555 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01 11:24:11 +00:00
ngoto 10e28726a1 * string.c (rb_str_subseq, str_substr): When RSTRING_EMBED_LEN_MAX
is used, TERM_LEN(str) should be considered with it because
  embedded strings are also processed by TERM_FILL.
  Additional fix for [Bug #12536] [ruby-dev:49699].


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55552 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01 04:50:38 +00:00
ngoto 6734a0c3d9 string.c: Add parentheses to avoid C source code ambiguity. [Bug #12536]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55551 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01 03:58:51 +00:00
ngoto f2ee22371b * string.c: Fix memory corruptions when using UTF-16/32 strings.
[Bug #12536] [ruby-dev:49699]

* string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
  the same as largest value of rb_enc_mbminlen(enc) among encodings.

* string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
  +TERM_LEN_MAX bytes instead of +1. This change may increase memory
  usage.

* string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".

* string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
  of termlen.

* string.c (str_shared_replace): Copy +termlen bytes instead of +1.

* string.c (rb_str_times): termlen should not be included in capa.

* string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
  termlen should be counted with it because embedded strings are
  also processed by TERM_FILL.

* string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.

* string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55547 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-30 10:20:23 +00:00
nobu bcf0a198f1 CASEMAP_DEBUG [ci skip]
* string.c (rb_str_casemap, rb_str_ascii_casemap): move
  debug/tuning messages under a preprocessor condition,
  CASEMAP_DEBUG.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55483 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-21 08:19:59 +00:00
nobu 3a6bb56029 Fix garbage allocation
* string.c (rb_str_casemap): do not put code with side effects
  inside RSTRING_PTR() macro which evaluates the argument multiple
  times.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55481 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-21 07:38:16 +00:00
naruse 8272729977 * string.c (rb_str_casemap): fix memory leak.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55480 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-21 07:14:05 +00:00
naruse 9d291c82e5 * string.c (rb_str_casemap): int is too small for string size.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55479 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-21 07:14:04 +00:00
nobu 1cbc622ea7 string.c: adjust buffer size
* string.c (tr_trans): adjust buffer size by processed and rest
  lengths, instead of doubling repeatedly.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55428 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-16 03:17:54 +00:00
nobu cc9f1e9195 string.c: fix terminator
* string.c (tr_trans): consider terminator length and fix heap
  overflow.  reported by Guido Vranken <guido AT guidovranken.nl>.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55427 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-16 02:15:27 +00:00
nobu aaf8c09900 Fix typo in string.c [ci skip]
* string.c (rb_str_oct): [DOC] fix typo, hornored -> honored.
  [Fix GH-1379]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55378 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-11 06:02:46 +00:00
duerst 02f7ad6237 * enc/iso_8859_1.c: Implement non-ASCII case mapping.
* test/ruby/enc/test_case_comprehensive.rb: Tests for above.
* string.c: Add iso-8859-1 to supported encodings.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55373 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-11 00:46:21 +00:00
duerst 10174c295b * string.c: Special-case :ascii option in rb_str_capitalize_bang and
rb_str_swapcase_bang.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55361 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-10 08:35:17 +00:00
duerst 13f576d6b9 * string.c: Special-case :ascii option in rb_str_upcase_bang (retry).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55359 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-10 08:12:28 +00:00
nobu 2667d1b38f hash.c: ensure NUL-terminated for ENV
* hash.c (get_env_cstr): ensure NUL-terminated.
  [ruby-dev:49655] [Bug #12475]
* string.c (rb_str_fill_terminator): return the pointer to the
  NUL-terminated content.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55345 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-10 05:48:38 +00:00
kazu 075cf3d2e8 string.c (rb_str_ascii_casemap): fix compile error.
error: implicit conversion loses integer precision: 'long' to 'int' [-Werror,-Wshorten-64-to-32]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55332 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-08 14:11:17 +00:00
duerst 872f9a498f * string.c: Revert previous commit (possibility of endless loop).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55331 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-08 13:22:28 +00:00
duerst 5eb73eeda8 * string.c: Special-case :ascii option in rb_str_upcase_bang.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55330 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-08 12:57:44 +00:00
duerst f0fc6ec872 * string.c: New static function rb_str_ascii_casemap; special-casing
:ascii option in rb_str_upcase_bang and rb_str_downcase_bang.
* regenc.c: Fix a bug (wrong use of unnecessary slack at end of string).
* regenc.h -> include/ruby/oniguruma.h: Move declaration of
  onigenc_ascii_only_case_map so that it is visible in string.c.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55329 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-08 12:28:42 +00:00
duerst 8743f010c6 * string.c (rb_str_upcase_bang, rb_str_capitalize_bang,
rb_str_swapcase_bang): Switch to use primitive.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55310 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-07 08:18:42 +00:00
duerst 53a3e3ddd9 * string.c (rb_str_downcase_bang): Switch to use primitive except if
conversion can be done ASCII-only.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55308 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-07 07:44:19 +00:00
duerst ab5f23f26c * string.c: Added UTF-16BE/LE and UTF-32BE/LE to supported encodings
for Unicode case mapping.
* test/ruby/enc/test_case_comprehensive.rb: Tests for above
  functionality; fixed an encoding issue in assertion error message.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55296 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-06 09:36:36 +00:00
duerst 2f49aa8f62 * string.c Change rb_str_casemap to use encoding primitive
case_map instead of directly calling onigenc_unicode_case_map.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55293 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-06 04:37:10 +00:00
duerst c5ea268264 * string.c: Remove :lithuanian guard for Unicode case mapping.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55277 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-05 05:46:37 +00:00
nobu 40c3c3ec6c crypt.h: remove initialized
* missing/crypt.h (struct crypt_data): remove unnecessary member
  "initialized".
* missing/crypt.c (des_setkey_r): nothing to be initialized in
  crypt_data.
* configure.in (struct crypt_data): check for "initialized" in
  struct crypt_data, which may be only in glibc, and isn't on AIX
  at least.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55272 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-04 01:54:54 +00:00
duerst 3dd98b2446 * string.c: Raise ArgumentError when invalid string is detected in
case mapping methods.
* enc/unicode.c: Check for invalid string and signal with negative
  length value.
* test/ruby/enc/test_case_mapping.rb: Add tests for above.
* test/ruby/test_m17n_comb.rb: Add a message to clarify test failure.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55253 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-02 01:24:52 +00:00
nobu a94201243e string.c: fallback to crypt_r
* string.c: prefer crypt_r to crypt iff system crypt nor crypt_r
  are not provided.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55250 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-01 13:17:31 +00:00
nobu a8bfa9bdf1 use system crypt
* configure.in: revert r55237.  replace crypt, not crypt_r, and
  check if crypt is broken more.
* missing/crypt.c: move crypt_r.c
* string.c (rb_str_crypt): use crypt_r if provided by the system.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55245 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-01 06:58:21 +00:00
nobu 3c31685e11 use crypt_r
* string.c (rb_str_crypt): use reentrant crypt_r.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55237 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-01 00:48:08 +00:00
naruse e6ff652ce8 Revert r55225
Run test-all before large commit:
"* string.c: Activate full Unicode case mapping for UTF-8 by removing"

This reverts commit 3fb0fcd1e8.
http://rubyci.s3.amazonaws.com/centos5-64/ruby-trunk/log/20160531T013303Z.fail.html.gz

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55226 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-31 02:56:09 +00:00
duerst 3fb0fcd1e8 * string.c: Activate full Unicode case mapping for UTF-8 by removing
the protective check for the presence of an option.
  Update documentation.
* test/ruby/enc/test_case_comprehensive.rb: Adjust tests for above change.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55225 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-31 01:10:06 +00:00
duerst ae4fba3167 * string.c: Document current behavior for other case mapping methods
on String. [ci skip]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55217 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30 12:15:41 +00:00
duerst 85950c5257 * string.c: Document current situation for String#downcase. [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55215 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30 11:00:26 +00:00
nobu 79a85b18cc string.c: return reallocated pointer
* string.c (str_fill_term): return new pointer reallocated by
  filling terminator.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55212 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30 07:20:28 +00:00
nobu 9ac5f9135a string.c: get rid of unnecessary empty string
* string.c (str_substr, rb_str_aref): refactor not to create
  unnecessary empty string.
* string.c (str_byte_substr, str_byte_aref): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55209 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30 05:50:27 +00:00
nobu e3e8cae9be string.c: check in the order
* string.c (rb_str_aref_m, rb_str_byteslice): check arguments in
  the left-to-right order.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55208 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30 05:41:02 +00:00
nobu 4fad63da01 transcode.c: scrub in the given encoding
* transcode.c (str_transcode0): scrub in the given encoding when
  the source encoding is given, not in the encoding of the
  receiver.  [ruby-core:75732] [Bug #12431]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55181 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-27 08:09:46 +00:00
nobu b493d156de string.c: integer overflow
* string.c (rb_str_modify_expand): check integer overflow.
  [ruby-core:75592] [Bug #12390]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55054 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-18 05:52:40 +00:00
nobu 4a9705d6e3 ruby.h: RB_INTEGER_TYPE_P
* include/ruby/ruby.h (RB_INTEGER_TYPE_P): new macro and
  underlying inline function to check if the object is an
  Integer (Fixnum or Bignum).

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-18 01:17:43 +00:00
naruse 28f5e12c24 * configure.in: check function attirbute const and pure,
and define CONSTFUNC and PUREFUNC if available.
  Note that I don't add those options as default because
  it still shows many false-positive (it seems not to consider
  longjmp).

* vm_eval.c (stack_check): get rb_thread_t* as an argument
  to avoid duplicate call of GET_THREAD().

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54952 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-08 17:44:51 +00:00
yui-knk deca1d8007 * string.c (rb_str_sub): Fix a special match variable name.
[ci skip]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54915 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-05 05:39:35 +00:00
naruse cdef0bc833 * string.c (count_utf8_lead_bytes_with_word): Use __builtin_popcount
only if it can use SSE 4.2 POPCNT whose latency is 3 cycle.

* internal.h (rb_popcount64): use __builtin_popcountll because now
  it is in fast path.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54894 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-03 13:14:30 +00:00
nobu c353ec0c9e string.c: shortcut
* string.c (rb_str_concat): shortcut concatenation to ASCII-8BIT
  as well as US-ASCII.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54882 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-02 03:58:28 +00:00
nobu 321c6df89b string.c: fix doc
* string.c (rb_str_concat): [DOC] fix the indefinite article, for
  replacement from Fixnum to Integer.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54881 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-02 03:53:34 +00:00
nobu 0e3475a6d9 string.c: fix braces
* string.c (search_nonascii): fix braces unmatched by a
  preprocessing condition.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54879 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-02 00:06:04 +00:00
naruse 2fc973796a fix mixed declaration on non UNALIGNED_WORD_ACCESS
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54877 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-01 18:27:41 +00:00
naruse 64837f778a fix for where UNALIGNED_WORD_ACCESS is not allowed
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54867 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-01 14:19:02 +00:00
naruse db2c32778d Use WORDS_BIGENDIAN
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54862 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-01 09:07:14 +00:00
naruse 0f0121fe1a * string.c (search_nonascii): use nlz on big endian environments.
* internal.h (nlz_intpr): defined.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54859 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-04-30 22:32:05 +00:00
naruse 424a706afe More optimization for r54854's search_nonascii
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54857 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-04-30 16:32:36 +00:00
naruse 4cf460a7bb * string.c (search_nonascii): unroll and use ntz
* configure.in (__builtin_ctz): check.

* configure.in (__builtin_ctzll): check.

* internal.h (rb_popcount32): defined for ntz_int32.
  it can use __builtin_popcount but this function is not used on
  GCC environment because it uses __builtin_ctz.
  When another function uses this, using __builtin_popcount
  should be re-considered.

* internal.h (rb_popcount64): ditto.

* internal.h (ntz_int32): defined for ntz_intptr.

* internal.h (ntz_int64): defined for ntz_intptr.

* internal.h (ntz_intptr): defined as ntz for uintptr_t.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54854 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-04-30 15:39:02 +00:00
nobu a491508753 string.c: rb_str_concat_literals
* string.c (rb_str_concat_literals): concatenate literal string
  fragments.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54490 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-04-05 08:15:22 +00:00
nobu 0f32783976 string.c: skip invalid char gap
* string.c (enc_succ_alnum_char): try to skip an invalid character
  gap between GREEK CAPITAL RHO and SIGMA.
  [ruby-core:74478] [Bug #12204]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54210 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-03-21 10:09:33 +00:00
nobu 49a272d728 string.c: Symbol#match
* string.c (sym_match_m): delegate to String#match but not
  String#=~.  [ruby-core:72864] [Bug #11991]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53866 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-18 12:06:20 +00:00
nobu 5a6a502ef9 string.c: fix rb_str_init
* string.c (rb_str_init): fix segfault and memory leak, consider
  wide char encoding terminator.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53855 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-17 11:24:09 +00:00
naruse d092fc5398 Additional fix and tests for r53851
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53854 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-17 10:15:28 +00:00
nobu b6053df008 remove unnecessary declaration so that rdoc works
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53852 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-17 07:37:20 +00:00
naruse 49dee548f4 fix rubyspec error from r53850
http://rubyci.s3.amazonaws.com/tk2-243-31075/ruby-trunk/log/20160217T061402Z.fail.html.gz

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53851 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-17 07:24:13 +00:00
naruse d46e2aea71 * string.c (rb_str_init): introduce String.new(capacity: size)
[Feature #12024]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53850 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-17 03:21:35 +00:00
duerst 2ca7569c6d * string.c, enc/unicode.c: Disassociating ONIGENC_CASE_FOLD flag from
ONIGENC_CASE_DOWNCASE.
(with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53778 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-08 11:44:12 +00:00
nobu 1bea5a6127 string.c: remove magic number
* string.c (rb_str_dump): share same string literal instead of a
  magic number.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53774 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-08 03:44:48 +00:00
nobu 6442f02176 string.c: use encoding index
* string.c (rb_external_str_with_enc, rb_str_concat, rb_str_dump):
  use encoding index as shortcut without rb_encoding.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53773 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-08 03:41:16 +00:00
nobu 94c70c7d72 fstring_enc_new
* string.c (rb_fstring_enc_new, rb_fstring_enc_cstr): functions to
  make fstring with encoding.
* re.c (rb_reg_initialize): make fstring without copying.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53736 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-04 06:35:34 +00:00
naruse 040ce05610 * string.c (str_new_frozen): if the given string is embeddedable
but not embedded, embed a new copied string. [Bug #11946]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53724 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-03 04:52:13 +00:00
naruse 21daa56b2a * re.c: Introduce RREGEXP_PTR.
patch by dbussink.
  partially merge https://github.com/ruby/ruby/pull/497

* include/ruby/ruby.h: ditto.

* gc.c: ditto.

* ext/strscan/strscan.c: ditto.

* parse.y: ditto.

* string.c: ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53715 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-02 04:39:44 +00:00
nobu 439224a590 RUBY_ASSERT
* error.c (rb_assert_failure): assertion with stack dump.
* ruby_assert.h (RUBY_ASSERT): new header for the assertion.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53615 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-22 08:33:55 +00:00
hsbt 4c6713f374 * string.c: fix a typo. [fix GH-1202][ci skip] Patch by @sunboshan
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53571 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-18 02:48:24 +00:00
duerst e580847ce8 * string.c: Any kind of option is now taking the new code path for
upcase/downcase/capitalize/swapcase. :lithuanian can be used for
  testing if no specific option is desired.
* test/ruby/enc/test_case_mapping.rb: Adjusted to above.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53565 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-17 11:40:46 +00:00
duerst 959bbb6f72 * enc/unicode.c: Removed artificial expansion for Turkic,
added hand-coded support for Turkic, fixed logic for swapcase.
* string.c: Made use of new case mapping code possible from upcase,
  capitalize, and swapcase (with :lithuanian as a guard).
* test/ruby/enc/test_case_mapping.rb: Adjusted for above.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53562 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-17 08:42:16 +00:00
duerst c12af76763 * enc/unicode.c: Artificial mapping to test buffer expansion code.
* string.c: Fixed buffer expansion logic.
* test/ruby/enc/test_case_mapping.rb: Tests for above.
(with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53554 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-16 08:24:58 +00:00
hsbt 219467abde * enc/unicode.c: fix implicit conversion error with clang. fixup r53548.
* string.c: ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53552 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-16 01:51:58 +00:00
svn 72fa5a8ee5 * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53549 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-16 01:24:04 +00:00
duerst be897c2507 * string.c, enc/unicode.c: New code path as a preparation for Unicode-wide
case mapping. The code path is currently guarded by the :lithuanian
  option to avoid accidental problems in daily use.
* test/ruby/enc/test_case_mapping.rb: Test for above.
* string.c: function 'check_case_options': fixed logical errors

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53548 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-16 01:24:03 +00:00
duerst 4a5d3572e6 string.c: made a variable name more grammatically correct
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53510 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-12 09:42:07 +00:00
duerst f23658f1c1 string.c: minor grammar fix [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53509 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-12 09:35:00 +00:00
svn d956652d90 * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53504 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-12 07:03:32 +00:00
duerst 2788cd9849 string.c: Added option parsing/checking for upcase/downcase/
capitalize/swapcase (with Kimihito Matsui

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53503 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-12 07:03:31 +00:00
nobu 2a4729a40d Fix rdoc for String#rstrip!, lstrip! [ci skip]
* string.c (rb_str_lstrip_bang, rb_str_rstrip_bang): [DOC] Fix
  ruby-doc comments for String#rstrip! and lstrip!.  It looks like
  dropped bang.  [Fix GH-1175]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53330 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-27 09:08:17 +00:00
yui-knk adc0898538 * string.c: Fix document. Default value of the first
argument of `String#split` is not `$;` but `nil`.
  When `nil` is passed as first argument, `$;` is used.
  [ci skip] [Bug #11729] [ruby-dev:49378]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53260 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-23 03:02:07 +00:00
nobu 9da8a29760 string.c: no exception on dummy encoding
* string.c (str_compat_and_valid): as scrub does nothing for dummy
  encoding string now, incompatible encoding is not a matter.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53235 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-22 06:21:14 +00:00
nobu 61c19c9d43 string.c: infection
* string.c (rb_str_scrub): the result should be infected by the
  original string.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53169 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-17 05:16:27 +00:00
nobu 365fae4dd9 string.c: radix indicators [ci skip]
* string.c (rb_str_oct): [DOC] mention radix indicators.
  [ruby-core:71310] [Bug #11648]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53122 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-15 04:49:59 +00:00
hsbt 52cd994814 * enum.c: fix a typo in documentation.
[ci skip][fix GH-1140] Patch by @jutaz
* io.c: ditto.
* iseq.c: ditto.
* numeric.c: ditto.
* process.c: ditto.
* string.c: ditto.
* vm_trace.c: ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53105 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-14 02:52:14 +00:00
naruse e3ab670a71 * object.c (rb_inspect): dump inspected result with rb_str_escape()
instead of raising Encoding::CompatibilityError. [Feature #11801]

* string.c (rb_str_escape): added to dump given string like
  rb_str_inspect without quotes and always dump in US-ASCII
  like rb_str_dump.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53027 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-10 18:57:08 +00:00
nobu cf183a58de string.c: use rb_id_encoding
* string.c (rb_str_init): rb_id_encoding() returns same ID with
  caching.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52982 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-08 23:41:27 +00:00
usa 4466d4baa9 * string.c (rb_str_init): now accepts new option parameter `encoding'.
[Feature #11785]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52976 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-08 16:48:52 +00:00
duerst d9c6135c5b string.c: removed unused variable
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52944 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-08 08:36:43 +00:00
ko1 6d8bf54c44 * string.c: introduce String#+@ and String#-@ to control
String mutability.
  [Feature #11782]




git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52917 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-07 15:10:00 +00:00
nobu cae3905e89 string.c: should not taint fstring
* string.c (rb_obj_as_string): fstring should not be infected.
  re-apply r52872 and fix a typo.
  TODO: other frozen strings also may not be.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52882 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-04 07:48:22 +00:00
naruse f2532ab8ca Revert r52872 "string.c: should not taint fstring"
This reverts commit b887c7c20a.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52878 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-04 04:10:00 +00:00
nobu b887c7c20a string.c: should not taint fstring
* string.c (rb_obj_as_string): fstring should not be infected.
  TODO: other frozen strings also may not be.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52872 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-03 07:02:19 +00:00
nobu 0167fc15fb string.c: adjust argument qualifier
* string.c (str_make_independent_expand): adjust argument
  qualifier to get rid of a VC bug.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52844 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-02 00:45:32 +00:00
nobu d58f17f37d string.c: no frozen error at cstr
* string.c (rb_string_value_cstr): should not raise on frozen
  string.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52833 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-01 08:13:43 +00:00
normal f9806460e9 string.c: use predefined IDs for minor bloat reduction
* string.c (id_to_s): remove redundant variable
  (rb_obj_as_string): trade id_to_s for idTo_s
  (rb_str_equal): replace rb_intern(...) with pre-defined ID
  (rb_str_cmp_m): ditto
  (rb_str_match): ditto
  (str_upto_each): ditto
  (rb_str_sum): ditto
  (Init_String): remove id_to_s initialization

This leads to a minor size reduction on my x86 (32-bit) system:

   text	   data	    bss	    dec	    hex	filename
 129373	      8	     32	 129413	  1f985	string.o-orig
 129082	      8	      8	 129098	  1f84a	string.o

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52479 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-11-07 03:18:58 +00:00
ko1 05b9b42918 * encoding.c (rb_enc_check_str): add for performance.
This function only accept T_STRING (and T_REGEXP).

  This patch improves performance of a tiny_segmenter benchmark
  (num=2) 2.54sec -> 2.42sec on my machine.
  https://github.com/chezou/TinySegmenter.jl/blob/master/benchmark/benchmark.rb

* encoding.c: add ENC_DEBUG and ENC_ASSERT() macros.

* internal.h: add a decl. of rb_enc_check_str().

* string.c (rb_str_plus): use rb_enc_check_str().

* string.c (rb_str_subpat_set): ditto.




git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52350 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-10-29 09:10:32 +00:00