Граф коммитов

1637 Коммитов

Автор SHA1 Сообщение Дата
nobu c05fa459bb quote symbols
* sprintf.c (ruby__sfvextra): quote symbols as identifiers.

* string.c (rb_id_quote_unprintable): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62747 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-14 02:35:51 +00:00
k0kubun 288b44328d Export some missing symbols for MJIT
tool/ruby_vm/views/_insn_name_info.erb: on Linux, rb_vm_insn_name_offset
was needed to compile with --jit-debug (Usually --jit-debug requires
more symbols than the situation without --jit-debug because -O2 skips
some functions to compile).

vm.c: when running transform_mjit_header.rb with --jit-wait,
rb_source_location_cstr was repoted to be missing.

string.c: ditto, for rb_str_eql
numeric.c: ditto, for rb_float_eql

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62313 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-08 13:54:37 +00:00
k0kubun ed935aa5be mjit_compile.c: merge initial JIT compiler
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.

This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.

This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).

Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.

I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.

common.mk: update dependencies for mjit_compile.c.

internal.h: declare `rb_vm_insn_addr2insn` for MJIT.

vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.

win32/mkexports.rb: export thread/ec functions, which are used by MJIT.

include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.

array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.

I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.

Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>

Part of [Feature #14235]

---

* Known issues
  * Code generated by gcc is faster than clang. The benchmark may be worse
    in macOS. Following benchmark result is provided by gcc w/ Linux.
  * Performance is decreased when Google Chrome is running
  * JIT can work on MinGW, but it doesn't improve performance at least
    in short running benchmark.
  * Currently it doesn't perform well with Rails. We'll try to fix this
    before release.

---

* Benchmark reslts

Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores

- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option

** Optcarrot fps

Benchmark: https://github.com/mame/optcarrot

|         |2.0.0-p0 |r62186   |JIT off  |JIT on   |
|:--------|:--------|:--------|:--------|:--------|
|fps      |37.32    |51.46    |51.31    |58.88    |
|vs 2.0.0 |1.00x    |1.38x    |1.37x    |1.58x    |

** MJIT benchmarks

Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)

|           |2.0.0-p0 |r62186   |JIT off  |JIT on   |
|:----------|:--------|:--------|:--------|:--------|
|aread      |1.00     |1.09     |1.07     |2.19     |
|aref       |1.00     |1.13     |1.11     |2.22     |
|aset       |1.00     |1.50     |1.45     |2.64     |
|awrite     |1.00     |1.17     |1.13     |2.20     |
|call       |1.00     |1.29     |1.26     |2.02     |
|const2     |1.00     |1.10     |1.10     |2.19     |
|const      |1.00     |1.11     |1.10     |2.19     |
|fannk      |1.00     |1.04     |1.02     |1.00     |
|fib        |1.00     |1.32     |1.31     |1.84     |
|ivread     |1.00     |1.13     |1.12     |2.43     |
|ivwrite    |1.00     |1.23     |1.21     |2.40     |
|mandelbrot |1.00     |1.13     |1.16     |1.28     |
|meteor     |1.00     |2.97     |2.92     |3.17     |
|nbody      |1.00     |1.17     |1.15     |1.49     |
|nest-ntimes|1.00     |1.22     |1.20     |1.39     |
|nest-while |1.00     |1.10     |1.10     |1.37     |
|norm       |1.00     |1.18     |1.16     |1.24     |
|nsvb       |1.00     |1.16     |1.16     |1.17     |
|red-black  |1.00     |1.02     |0.99     |1.12     |
|sieve      |1.00     |1.30     |1.28     |1.62     |
|trees      |1.00     |1.14     |1.13     |1.19     |
|while      |1.00     |1.12     |1.11     |2.41     |

** Discourse's script/bench.rb

Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb

NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
 to fix it. Please wait for the fix.)

*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)

categories_admin:
  50: 17
  75: 18
  90: 22
  99: 29
home_admin:
  50: 21
  75: 21
  90: 27
  99: 40
topic_admin:
  50: 17
  75: 18
  90: 22
  99: 32
categories:
  50: 35
  75: 41
  90: 43
  99: 77
home:
  50: 39
  75: 46
  90: 49
  99: 95
topic:
  50: 46
  75: 52
  90: 56
  99: 101

*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)

categories_admin:
  50: 19
  75: 21
  90: 25
  99: 33
home_admin:
  50: 24
  75: 26
  90: 30
  99: 35
topic_admin:
  50: 19
  75: 20
  90: 25
  99: 30
categories:
  50: 40
  75: 44
  90: 48
  99: 76
home:
  50: 42
  75: 48
  90: 51
  99: 89
topic:
  50: 49
  75: 55
  90: 58
  99: 99

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 11:22:28 +00:00
mame 552a5a993c string.c (rb_str_format_m): Fix the example code of the doc
Change `%08x` to `%016x` because of two reasons:

* `%016x` demonstrates that we can use two or more digits here.
* Currently, many people uses 64-bit environment.
  (I'm unsure if object_id is a good example here, though...)
I'm unsure if

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 08:40:22 +00:00
nobu 9237049efe string.c: clear substring code range
* string.c (str_substr): substring of broken code range string may
  be valid or broken.  patch by tommy (Masahiro Tomita) at
  [ruby-dev:50430] [Bug #14388].

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62040 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-25 13:10:14 +00:00
shyouhei dc1e6f17ba sizeof(uintptr_t) != sizeof(uintptr_t *)
Reported by mame.  Thanks.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61865 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-16 03:09:53 +00:00
shyouhei 39cfa67b4f __builtin_assume_aligned for *(foo *) casts
These casts are guarded. Must be safe to assume alignments.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61829 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-15 02:35:18 +00:00
nobu 6b5e0bd98c exclude flexible array size with old compilers
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61814 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-14 11:19:18 +00:00
mame 982e9e6235 string.c (struct mapping_buffer): Use FLEX_ARY_LEN
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61811 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-13 13:08:05 +00:00
usa b87571100a should cause preprocess error as other cases
* string.c (NONASCII_MASK): should cause preprocess error immediately if the
  compiler does not satisfy our assumptions.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61756 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-10 03:54:02 +00:00
nobu e9cb552ec9 internal.h: remove dependecy on ruby/encoding.h
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61713 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-09 06:24:11 +00:00
nobu ee85a6e72b internal.h: remove dependecy on ruby/io.h
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61712 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-09 06:24:10 +00:00
nobu e043ae7348 string.c: out-of-bounds access
* string.c (rb_str_enumerate_lines): fix out-of-bounds access when
  record separator is longer than the last element.  [Bug #14257]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61636 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-06 08:44:17 +00:00
shyouhei beaf2ace87 ULL suffix is a C99ism
Don't assume long long == 8 bytes.

If you can assume C99, there are macros named UINT64_C and
such for appropriate integer literal suffixes.
If you can't, no way but do a bitwise or.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61594 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-04 07:51:17 +00:00
nobu a1bbb2a780 Fix doc typo in Symbol#to_proc [Fix GH-1785]
[ci skip]

From: Dimitris Zorbas <dimitrisplusplus@gmail.com>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61588 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-04 00:44:40 +00:00
nobu 634a48c5c1 string.c: chomp rs at the end
* string.c (rb_str_enumerate_lines): should chomp record separator
  only, but not a newline, at the end of the receiver as well as
  middle, if the separator is given.
  [ruby-core:84552] [Bug #14257]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61513 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-29 12:19:03 +00:00
kazu c9bdee3e0d [DOC] Fix typos in downcase [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61488 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-27 00:04:30 +00:00
nobu e2479cc43f encoding.c: rb_enc_find_index2
* string.c (str_undump): use rb_enc_find_index2 to find encoding
  by unterminated string.  check the format before encoding name.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61396 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-22 01:03:17 +00:00
nobu 168c019998 string.c: fix memory leak
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61386 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-21 07:59:00 +00:00
naruse 05d1d29d1f Don't allow mixed escape
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61381 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-21 05:09:17 +00:00
naruse 188d85934b move dump format validation into parsing epilogue
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61380 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-21 05:09:16 +00:00
naruse 29c6ca423c fix escapes in undump
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61379 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-21 05:08:57 +00:00
nobu 7c18db61a1 string.c: multiple codepoints
* string.c (undump_after_backslash): fix multiple codepoints in
  braces.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61290 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-16 00:30:52 +00:00
nobu ae18c8f5b6 string.c: suppress warning
* string.c (str_undump): suppress maybe-uninitialized warning by
  gcc 7 and later.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61289 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-16 00:03:51 +00:00
tadd bbec11d329 Implement String#undump to unescape String#dump-ed string
[Feature #12275] [close GH-1765]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61228 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-14 08:47:13 +00:00
nobu a1692f7fdf string.c: fix rb_external_str_new_with_enc
* string.c (rb_external_str_new_with_enc): do not search non-ascii
  by NULL pointer.  [ruby-core:84055] [Bug #14150]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60979 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-02 07:09:16 +00:00
nobu 73e41247b9 string.c: prefer rb_syserr_fail
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60761 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-14 03:02:58 +00:00
rhe a82aaea719 string.c: fix up r60748
An #ifdef was missing in r60748 and build broke on systems without
crypt_r().

https://rubyci.org/logs/rubyci.s3.amazonaws.com/unstable11s/ruby-trunk/log/20171112T162503Z.fail.html.gz

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60749 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-12 17:10:29 +00:00
rhe 0b845a8458 string.c: fix memory leak in String#crypt
Use ALLOCV to allocate struct crypt_data for slightly cleaner and less
error-prone code. It is currently possible it leaks when an invalid
argument is passed to String#crypt or rb_str_new_cstr() fails to
allocate memory.

SIZEOF_CRYPT_DATA macro in missing/crypt.h is removed since it is not
used any longer.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60748 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-12 15:55:04 +00:00
stomar 8b1c1c55a9 string.c: improve docs for String#{concat,<<}
* string.c: [DOC] remove a misleading call-seq for String#concat,
  which suggests that all arguments must be Integers in this case;
  also clarify in the example that the receiver is modified;
  fix grammar for String#<<; move references to the end.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60712 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-07 20:15:59 +00:00
stomar 3262f4910f string.c: fix typos
* string.c: [DOC] fix typos in doxygen comments.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60707 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-07 20:11:09 +00:00
stomar 51b0230a9b string.c: improve docs
* string.c: [DOC] fix rdoc for cross reference; fix grammar.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60574 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-29 21:43:36 +00:00
watson1978 b03a44c4ac string.c: Improve String#prepend performance if only one argument is given
* string.c (rb_str_prepend_multi): Prepend the string without generating
    temporary String object if only one argument is given.
	This is very similar with https://github.com/ruby/ruby/pull/1634

	String#prepend -> 47.5 % up

    [Fix GH-1670] [ruby-core:82195] [Bug #13773]

* Before
      String#prepend      1.517M (± 1.8%) i/s -      7.614M in   5.019819s

* After
      String#prepend      2.236M (± 3.4%) i/s -     11.234M in   5.029716s

* Test code
require 'benchmark/ips'

Benchmark.ips do |x|
  x.report "String#prepend" do |loop|
    loop.times { "!".prepend("hello") }
  end
end

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60480 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-27 14:55:03 +00:00
nobu 2050b50d85 string.c: comment layout [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60331 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-22 00:00:41 +00:00
svn 4b8c94dd84 * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60329 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 23:49:36 +00:00
sonots 84616bf979 * string.c: [DOC] Split rdoc of String#<< and String#concat [ci skip]
Split String#<< and String#concat docs to reflect single and multiple
arguments

patched by MSP-Greg [fix GH-1614]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60328 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 23:49:35 +00:00
sonots d9e11970f8 * string.c: Remove errant "the" in gsub documentation
patched by jlmuir (J. Lewis Muir) [fix GH-1679]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60324 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 23:35:40 +00:00
nobu 80c50308f9 Improve performance of string interpolation
This patch will add pre-allocation in string interpolation.
By this, unecessary capacity resizing is avoided.

For small strings, optimized `rb_str_resurrect` operation is
faster, so pre-allocation is done only when concatenated strings
are large.  `MIN_PRE_ALLOC_SIZE` was decided by experimenting with
local machine (x86_64-apple-darwin 16.5.0, Apple LLVM version
8.1.0 (clang - 802.0.42)).

String interpolation will be faster around 72% when large string is created.

* Before
  ```
  Calculating -------------------------------------
  Large string interpolation
                            1.276M (± 5.9%) i/s -      6.358M in   5.002022s
  Small string interpolation
                            5.156M (± 5.5%) i/s -     25.728M in   5.005731s
  ```

* After
  ```
  Calculating -------------------------------------
  Large string interpolation
                            2.201M (± 5.8%) i/s -     11.063M in   5.043724s
  Small string interpolation
                            5.192M (± 5.7%) i/s -     25.971M in   5.020516s
  ```

* Test code
  ```ruby
  require 'benchmark/ips'

  Benchmark.ips do |x|
    x.report "Large string interpolation" do |t|
      a = "Hellooooooooooooooooooooooooooooooooooooooooooooooooooo"
      b = "Wooooooooooooooooooooooooooooooooooooooooooooooooooorld"

      t.times do
        "#{a}, #{b}!"
      end
    end

    x.report "Small string interpolation" do |t|
      a = "Hello"
      b = "World"

      t.times do
        "#{a}, #{b}!"
      end
    end
  end
  ```

[Fix GH-1626]
From: Nao Minami <south37777@gmail.com>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60320 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 23:21:05 +00:00
hsbt 2c27e52f8e Add documentation for `chomp` option.
https://github.com/ruby/ruby/pull/1717

  Patch by @ksss [fix GH-1717]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60308 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 16:11:58 +00:00
sonots ec30bc5930 * string.c (deleted_prefix_length, deleted_suffix_length):
Add doxygen comment.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60254 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 10:33:25 +00:00
naruse 6187b0001b [Feature #13712] String#start_with? supports regexp
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60234 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-21 06:51:01 +00:00
glass 8320be1007 string.c: avoid unnecessary call of str_strlen()
* string.c (rb_strseq_index): refactor and avoid
  call of str_strlen() when offset == 0.
  it will improve performance of String#index and #include?

* benchmark/bm_string_index.rb: benchmark for this change

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60086 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-10-01 13:44:49 +00:00
nobu 16759238ad string.c: fix ASCII-only on succ
* string.c (str_succ): clear coderange cache when no alpha-numeric
  character case, carried part may become ASCII-only.
  [ruby-core:83062] [Bug #13952]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60066 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-30 00:01:23 +00:00
nobu 2d42119903 string.c: ASCII-incompatible is not ASCII only
* string.c (tr_trans): ASCII-incompatible encoding strings cannot
  be ASCII-only even if valid.  [ruby-core:83056] [Bug #13950]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60060 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-29 08:15:50 +00:00
nobu 8c59fdb8d8 dup String#split return value
* string.c (rb_str_split): return duplicated receiver, when no
  splits.  patched by tompng (tomoya ishida) in [ruby-core:82911],
  and the test case by Seiei Miyagi <hanachin@gmail.com>.
  [Bug#13925] [Fix GH-1705]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60002 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-23 07:09:07 +00:00
nobu e1be1d0c38 dup String#rpartition return value
* string.c (rb_str_rpartition): return duplicated receiver, when
  no splits.  [ruby-core:82911] [Bug#13925]

Author:    Seiei Miyagi <hanachin@gmail.com>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60001 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-23 07:09:06 +00:00
nobu b0326bce01 dup String#partition return value
* string.c (rb_str_partition): return duplicated receiver, when no
  splits.  [ruby-core:82911] [Bug#13925]

Author:    Seiei Miyagi <hanachin@gmail.com>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60000 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-23 07:09:05 +00:00
nobu b2da3824c5 refinements in string interpolation
* compile.c (iseq_compile_each0): insert to_s method call, so that
  refinements activated at the caller should take place.
  [Feature #13812]

* insns.def (tostring): fix up converted object to a string,
  infect and fallback.

* insns.def (branchiftype): new instruction for conversion.
  branches if TOS is an instance of the given type.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59950 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-18 02:27:13 +00:00
kazu 0f25c6d7d5 Fix a typo [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59764 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-06 13:46:31 +00:00
nobu bd10ce165c string.c: fix false coderange
* string.c (rb_enc_str_scrub): enc can differ from the actual
  encoding of the string, the cached coderange is useless then.
  [ruby-core:82674] [Bug #13874]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-06 13:11:44 +00:00
nobu faa26f5570 string.c: optimize enumerate_grapheme_clusters
* string.c (rb_str_enumerate_grapheme_clusters): optimize when
  single byte only.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59762 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-06 12:50:10 +00:00
nobu e568455826 string.c: grapheme clusters on frozen string
* string.c (rb_str_enumerate_grapheme_clusters): enumerate on
  shared frozen string.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59743 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-04 14:04:54 +00:00
nobu 805d6f6f3c string.c: enumerator_element
* string.c (enumerator_element): push or yield elements, and
  return 1 if needs checks.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59742 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-04 14:04:50 +00:00
nobu ded7e1c2a1 string.c: make array in WANTARRAY
* string.c (WANTARRAY): make array for the result in method
  functions and pass it to enumerator functions.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59736 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-03 13:21:07 +00:00
nobu adac779218 string.c: enumerator_wantarray
* string.c (enumerator_wantarray): show warnings at method
  functions for proper method names.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59732 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-03 02:08:55 +00:00
nobu 71de56621e string.c: fix for non-Unicode encodings
* string.c (rb_str_enumerate_grapheme_clusters): should enumerate
  chars for non-Unicode encodings.  [Feature #13780]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59731 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-03 01:47:19 +00:00
nobu fc1bf16696 string.c: suppress a warning
* string.c (rb_str_enumerate_grapheme_clusters): suppress a
  maybe-uninitialized warning by old gcc.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59730 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-03 00:39:23 +00:00
nobu bb03f02805 string.c: adjust indent [ci skip]
* string.c (rb_str_enumerate_grapheme_clusters): adjust indent.
  [Feature #13780]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59700 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-08-31 08:07:59 +00:00
naruse df49fc659e String#each_grapheme_cluster and String#grapheme_clusters
added to enumerate grapheme clusters [Feature #13780]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59698 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-08-31 06:35:28 +00:00
glass b3c70d4c7e string.c: fix potential bug in String#split
* string.c (rb_str_split_m): fix potential bug when rb_memsearch()
  matches a octet in the middle of a multi-byte character sequence.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59673 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-08-28 10:55:37 +00:00
naruse d067b263c7 Add optimization for creating zerofill string
```
require 'benchmark'
n = 1 * 1024 * 1024 * 1024
Benchmark.bmbm do |x|
  x.report("*") { 0.chr * n }
  x.report("ljust") { String.new(capacity: n).ljust(n, "\0") }
end
```

Before

```% ./ruby test.rb
Rehearsal -----------------------------------------
*       0.358396   0.392753   0.751149 (  1.134231)
ljust   0.203277   0.389223   0.592500 (  0.594816)
-------------------------------- total: 1.343649sec

            user     system      total        real
*       0.282647   0.304600   0.587247 (  0.589205)
ljust   0.201834   0.283801   0.485635 (  0.487617)
```

After

```% ./ruby test.rb
Rehearsal -----------------------------------------
*       0.000522   0.000021   0.000543 (  0.000534)
ljust   0.208551   0.321030   0.529581 (  0.542083)
-------------------------------- total: 0.530124sec

            user     system      total        real
*       0.000069   0.000006   0.000075 (  0.000069)
ljust   0.206698   0.301032   0.507730 (  0.517674)
```

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59614 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-08-17 16:34:40 +00:00
nobu 2b770b4674 string.c: improve String#scan
* string.c (rb_str_rstrip_bang): improve the performance in 50%
  for a string pattern, and in 10% for a regexp pattern.  get rid
  of making MatchData in middle, which is not used.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59496 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-08-04 04:39:53 +00:00
nobu 8458e709ab string.c: rb_str_initialize
* string.c (rb_str_initialize): new function to (re)initialize a
  string with data and encoding.  extracted from
  rb_external_str_new_with_enc.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59448 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-07-30 02:56:29 +00:00
sonots 510957df33 string.c: add String#delete_suffix and String#delete_suffix!
to remove trailing suffix [Feature #13665] [Fix GH-1661]

* string.c (rb_str_delete_suffix_bang): add a new method
  to remove suffix destuctively.

* string.c (rb_str_delete_suffix): add a new method
  to remove suffix non-destuctively.

* test/ruby/test_string.rb: add tests.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59377 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-07-20 16:29:19 +00:00
normal 0493b1ce3a revert r59359, r59356, r59355, r59354
These caused numerous CI failures I haven't been able to
reproduce [ruby-core:82102]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59364 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-07-19 01:35:04 +00:00
normal 86e266bb60 string: preserve taint flag with String#-@ (uminus)
* string.c (tainted_fstr_update): move up
  (rb_fstring): support registering tainted strings
  (register_fstring_tainted): extract from rb_fstring_existing0
  (rb_tainted_fstring_existing): use register_fstring_tainted instead

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59359 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-07-18 09:52:55 +00:00
normal d04c085b3c hash: keep fstrings of tainted strings for string keys
The same hash keys may be loaded from tainted data sources
frequently (e.g. parsing headers from socket or loading
YAML data from a file).  If a non-tainted fstring already
exists (because the application expects the hash key),
cache and deduplicate the tainted version in the new
tainted_frozen_strings table.

For non-embedded strings, this also allows sharing with the
underlying malloc-ed data.

* vm_core.h (rb_vm_struct): add tainted_frozen_strings
* vm.c (ruby_vm_destruct): free tainted_frozen_strings
  (Init_vm_objects): initialize tainted_frozen_strings
  (rb_vm_tfstring_table): accessor for tainted_frozen_strings
* internal.h: declare rb_fstring_existing, rb_vm_tfstring_table
* hash.c (fstring_existing_str): remove (moved to string.c)
  (hash_aset_str): use rb_fstring_existing
* string.c (rb_fstring_existing): new, based on fstring_existing_str
  (tainted_fstr_update): new
  (rb_fstring_existing0): new, based on fstring_existing_str
  (rb_tainted_fstring_existing): new, special case for tainted strings
  (rb_str_free): delete from tainted_frozen_strings table
* test/ruby/test_optimization.rb (test_hash_reuse_fstring): new test
  [ruby-core:82012] [Bug #13737]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59354 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-07-18 02:29:59 +00:00
rhe 06a3a10acf string.c: preserve coderange in String#setbyte
Fix a wrong jump so replacing a byte in an ASCII-only string with an
ASCII character won't clear the coderange.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59272 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-07-06 07:21:17 +00:00
rhe 2c8cec96cc string.c: remove dead code in str_fill_term()
The length of a string never exceeds the capacity.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59271 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-07-06 07:21:16 +00:00
sonots 10082360b9 string.c: add String#delete_prefix and String#delete_prefix!
to remove leading substr [Feature #12694] [fix GH-1632]

* string.c (rb_str_delete_prefix_bang): add a new method
  to remove prefix destuctively.

* string.c (rb_str_delete_prefix): add a new method
  to remove prefix non-destuctively.

* test/ruby/test_string.rb: add tests.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59132 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-06-21 07:43:26 +00:00
nobu f5052d45be string.c: check just before modification
* string.c (rb_str_chomp_bang): check if modifiable after checking
  an argument and just before modification, as it can get frozen
  during the argument conversion to String.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59112 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-06-18 04:38:01 +00:00
stomar 038c2e52d8 string.c: docs for String#split
* string.c: [DOC] clarify docs for String#split when called
  with limit and capture groups.
  Reported by Cichol Tsai.  [ruby-core:81505] [Bug #13621]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59002 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-06-02 21:29:27 +00:00
watson1978 d0015e4ac6 Improve performance of implicit type conversion
To convert the object implicitly, it has had two parts in convert_type() which are
  1. lookink up the method's id
  2. calling the method

Seems that strncmp() and strcmp() in convert_type() are slightly heavy to look up
the method's id for type conversion.

This patch will add and use internal APIs (rb_convert_type_with_id, rb_check_convert_type_with_id)
to call the method without looking up the method's id when convert the object.

Array#flatten -> 19 % up
Array#+       ->  3 % up

[ruby-dev:50024] [Bug #13341] [Fix GH-1537]

### Before
       Array#flatten    104.119k (± 1.1%) i/s -    525.690k in   5.049517s
             Array#+      1.993M (± 1.8%) i/s -     10.010M in   5.024258s

### After
       Array#flatten    124.005k (± 1.0%) i/s -    624.240k in   5.034477s
             Array#+      2.058M (± 4.8%) i/s -     10.302M in   5.019328s

### Test Code
require 'benchmark/ips'

class Foo
  def to_ary
    [1,2,3]
  end
end

Benchmark.ips do |x|

  ary = []
  100.times { |i| ary << i }
  array = [ary]

  x.report "Array#flatten" do |i|
    i.times { array.flatten }
  end

  x.report "Array#+" do |i|
    obj = Foo.new
    i.times { array + obj }
  end

end

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58978 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-31 12:30:57 +00:00
nobu a77cb8c80f string.c: adjust style [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58897 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-26 06:39:06 +00:00
k0kubun 592c3f9b10 string.c: Optimize String#concat when argc is 1
Optimize performance regression introduced in r56021.

* Benchmark (i7-4790K @ 4.00GH, x86_64 GNU/Linux)

Benchmark.ips do |x|
  x.report("String#concat (1)") { "a".concat("b") }
  if RUBY_VERSION >= "2.4.0"
    x.report("String#concat (2)") { "a".concat("b", "c") }
  end
end

* Ruby 2.3

Calculating -------------------------------------
   String#concat (1)      6.003M (± 5.2%) i/s -     30.122M in   5.031646s

* Ruby 2.4 (Before this patch)

Calculating -------------------------------------
   String#concat (1)      4.458M (± 8.9%) i/s -     22.298M in   5.058084s
   String#concat (2)      3.660M (± 5.6%) i/s -     18.314M in   5.020527s

* Ruby 2.4 (After this patch)

Calculating -------------------------------------
   String#concat (1)      6.448M (± 5.2%) i/s -     32.215M in   5.010833s
   String#concat (2)      3.633M (± 9.0%) i/s -     18.056M in   5.022603s

[fix GH-1631]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58886 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-25 11:14:40 +00:00
nobu 7db534a20c vm_insnhelper.c: rb_eql_opt should call eql?
* vm_insnhelper.c (rb_eql_opt): should call #eql? on Float and
  String, not #==.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58882 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-25 05:29:35 +00:00
normal 2079b71000 string.c: fix String#crypt leak introduced in r58866
* string.c (rb_str_crypt): define LARGE_CRYPT_DATA when allocating

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58876 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-24 21:26:14 +00:00
nobu 92261511b6 string.c: for small crypt_data
* string.c (rb_str_crypt): struct crypt_data defined in
  missing/crypt.h is small enough.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58866 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-24 06:55:09 +00:00
ko1 9e1624cfe8 Add debug counters.
* debug_counter.h: add the following counters to measure object types.
  obj_free: freed count
  obj_str_ptr: freed count of Strings they have extra buff.
  obj_str_embed: freed count of Strings they don't have extra buff.
  obj_str_shared: freed count of Strings they have shared extra buff.
  obj_str_nofree: freed count of Strings they are marked as nofree.
  obj_str_fstr: freed count of Strings they are marked as fstr.
  obj_ary_ptr: freed count of Arrays they have extra buff.
  obj_ary_embed: freed count of Arrays they don't have extra buff.
  obj_obj_ptr: freed count of Objects (T_OBJECT) they have extra buff.
  obj_obj_embed: freed count of Objects they don't have extra buff.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58865 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-24 06:46:44 +00:00
normal 144e067007 string.c (rb_str_crypt): fix excessive stack use with crypt_r
"struct crypt_data" is 131232 bytes on x86-64 GNU/Linux,
making it unsafe to use tiny Fiber stack sizes.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58864 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-24 03:01:44 +00:00
stomar 40bc846bf8 string.c: fix String#{casecmp,casecmp?} for non-string arguments
* string.c: make String#{casecmp,casecmp?} return nil for
  non-string arguments instead of raising a TypeError.

* test/ruby/test_string.rb: add tests.

Reported by Marcus Stollsteimer.  Based on a patch by Shingo Morita.
[ruby-core:80145] [Bug #13312]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58837 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-21 19:28:48 +00:00
nobu f3a49ebc92 string.c: cut down intermediate string
* string.c (rb_external_str_new_with_enc): cut down intermediate
  string for conversion source, by appending with conversion.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58709 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-14 00:21:00 +00:00
nobu 7323de517d revert r58703 & r58705
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58708 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-13 16:04:05 +00:00
nobu a3960d0a60 string.c: fix up r58703
* string.c (rb_external_str_new_with_enc): fix the case of
  conversion failure.  when conversion failed for some reason,
  just ignores the default internal encoding and returns in the
  given encoding.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58705 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-13 14:20:19 +00:00
nobu 7678c0d702 string.c: cut down intermediate string
* string.c (rb_external_str_new_with_enc): cut down intermediate
  string for conversion source, by appending with conversion.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58703 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-13 13:34:39 +00:00
nobu 7d52ad9e17 string.c: fix one-off bug
* string.c (rb_str_cat_conv_enc_opts): fix one-off bug.  `ofs`
  equals `olen` when appending at the end.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58702 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-13 12:31:01 +00:00
nobu 2c0baa97a9 string.c: remove bare Unicode.
* string.c (rb_str_unicode_normalize): remove bare Unicode.  do
  not assume that all compilers can handle UTF-8.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58688 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-12 15:29:55 +00:00
stomar 8303bc6cf6 string.c: docs for String#match
* string.c: [DOC] add example for String#match with pos argument.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58669 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-11 18:59:45 +00:00
stomar 4a0eaeeb4d string.c: docs for Symbol
* string.c: [DOC] adopt call-seq's for Symbol#{match,match?} from
  String methods; other small improvements for Symbol docs.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58668 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-11 18:58:27 +00:00
stomar 477cb2b8d8 string.c: docs for Symbol#{match,match?}
* string.c: [DOC] mention pos argument for Symbol#{match,match?}.
  Patch by Yuki Kurihara (ksss).  [Fix GH-1606]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58666 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-11 18:56:32 +00:00
nobu 208240870a string.c: fix r58618
* string.c (unicode_normalize_common): aggregation type cannot be
  initialized with dynamic values, in C89.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58621 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-09 14:11:46 +00:00
duerst a4301ec214 replace hand-written argument check by call to rb_scan_args in unicode_normalize_common
In string.c, replace hand-written argument count check by call to rb_scan_args.
This allows to use rb_funcallv once, rather than using rb_funcall twice.
Thanks to Hanmac (Hans Mackowiak) for the idea, see
https://bugs.ruby-lang.org/issues/11078#note-7.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58618 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-09 11:13:45 +00:00
nobu d7f2c72322 string.c: fix types
* string.c (id_normalize, id_normalized_p): fix types, IDs should
  be ID.

* string.c (unicode_normalize_common): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-06 01:01:52 +00:00
stomar 6ae3cf02f7 string.c: [DOC] improve docs for String.new
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58569 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-04 13:19:43 +00:00
ktsj 4e6daedc70 string.c: [DOC] Properly refer to keyword argument by its name
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58567 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-04 08:59:01 +00:00
duerst f47033e237 refactor common parts of unicode normalization functions into unicode_normalize_common
In string.c, refactor the common parts (requiring of unicode_normalize/normalize.rb,
check of number of arguments) of the unicode normalization functions
(rb_str_unicode_normalize, rb_str_unicode_normalize_bang, rb_str_unicode_normalized_p)
into the new function unicode_normalize_common.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58558 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-04 02:16:27 +00:00
duerst 140560e4ee move definition of String#unicode_normalized? to C to make sure it is documented
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalized?
  (including documentation). Leave a comment explaining that the file is now empty.
* string.c: Define String#unicode_normalized? in rb_str_unicode_normalized_p in C,
  (including documentation)
* lib/unicode_normalize/normalize.rb: Remove (re)definition of
  String#unicode_normalized? to avoid warnings (when $VERBOSE==true) and
  problems when String is frozen

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58555 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-04 02:00:19 +00:00
duerst 90ab1ee023 move definition of String#unicode_normalize! to C to make sure it is documented
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalize!
  (including documentation)
* string.c: Define String#unicode_normalize! in rb_str_unicode_normalize_bang in C,
  (including documentation)
* lib/unicode_normalize/normalize.rb: Remove (re)definition of
  String#unicode_normalize! to avoid warnings (when $VERBOSE==true) and
  problems when String is frozen

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58553 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-04 01:36:52 +00:00
svn 7abf8bae23 * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58551 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-03 12:18:37 +00:00
duerst 5fee67c9ba move definition of String#unicode_normalize to C to make sure it is documented
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalize
  (including documentation)
* string.c: Define String#unicode_normalize in rb_str_unicode_normalize in C,
  (including documentation)
* lib/unicode_normalize/normalize.rb: Remove (re)definition of
  String#unicode_normalize to avoid warnings (when $VERBOSE==true) and
  problems when String is frozen

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58550 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-03 12:18:37 +00:00
nobu cc68af3d02 string.c: improve insertion performace
* string.c (rb_str_splice_0): improve performace of single byte
  optimizable cases, insertion 7bit string to 7bit string.
  [ruby-dev:49984] [Bug #13228]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58383 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-17 13:38:34 +00:00
sorah 31a755e4f2 string.c: Supress logical-op-parentheses warning
* string.c(rb_str_upcase_bang): Supress logical-op-parentheses warning
  Patch by Fukuo Kadota <fukuo-kadota@cookpad.com>,
  Closes [GH-1570] [Bug #13387].

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58211 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-29 11:33:59 +00:00
nobu f0e6e47999 string.c: use the usable size
* string.c (rb_str_change_terminator_length): when called after
  the content has been copied, old terminator length no longer
  makes sense.  use the whole usable size instead of capacity
  without terminator.  [ruby-core:80257] [Bug #13339]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58042 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-21 05:28:38 +00:00
duerst a5330fa9ea fix accidental reversal of r57997 in r58000
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58009 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-18 01:35:03 +00:00
duerst 67c1197835 clarifiy 'codepoint' in documentation of String#each_codepoint
Make sure it's clear that the returned values are not Unicode codepoints
for encodings other than UTF-8/UTF-16(BE|LE)/UTF-32(BE|LE).

[ci skip] [Bug #13321]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58000 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-17 02:24:53 +00:00
normal 9eb94b4dc1 deduplicate static rb_str_format format strings
Anybody who hits these code paths can hit them again in the
future, so try deduplicating across multiple runs of these
methods to reduce garbage.

* string.c (str_upto_each): fstring on "%.*d"
* strftime.c (rb_strftime_with_timespec): fstring on "%0*d"

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57997 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-17 00:55:55 +00:00
nobu 8c661ba264 string.c: shortcut argument check
* string.c (str_casecmp, str_casecmp_p): split to skip argument
  check when it is a String certainly.

* string.c (sym_casecmp, sym_casecmp_p): shortcut argument checks.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57978 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-15 07:57:11 +00:00
nobu 9fa56026e5 string.c: use rb_check_string_type
* string.c (rb_str_cmp_m): use rb_check_string_type for check and
  conversion, instead of calling the conversion method directly.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57965 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-14 03:42:43 +00:00
stomar e6dec8f92e docs for Symbol#casecmp and Symbol#casecmp?
* string.c: [DOC] improve docs of Symbol#casecmp and Symbol#casecmp?
  according to the similar String methods; fix RDoc markup and typos;
  fix call-seq's for Symbol#{upcase,downcase,capitalize,swapcase}.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57963 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-13 20:20:40 +00:00
nobu bd17e25588 string.c (rb_str_set_len): pathological check
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57961 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-13 11:47:45 +00:00
nobu 16e804117c string.c: $; is a GC-root
* string.c (Init_String): $; must be a GC-root, not to be
  collected.  [ruby-core:79582]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57958 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-13 09:12:05 +00:00
stomar 605b472d2d docs for String#casecmp and String#casecmp?
* string.c: [DOC] specify when String#casecmp and String#casecmp?
  return nil; modify examples to better show difference to <=>;
  fix RDoc markup and typos.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57886 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-11 20:01:55 +00:00
normal e064879e5a string.c (str_uminus): update doc for deduplication
As of r57698, String#-@ can return pre-existing strings.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57813 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-08 21:24:24 +00:00
nobu e7f4d90930 fix paren
* string.c (str_byte_substr): fix misplaced parenthesis at r56155.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57809 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-08 08:19:56 +00:00
kazu d0708e9e2a string.c: [DOC] Fix a typo in String#dump
[Fix GH-1531][ci skip]
Author:    Alex Semyonov <alex@semyonov.us>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57802 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-07 13:04:39 +00:00
nobu d69d98f61a string.c: negation of LONG_MIN
* string.c (rb_str_update): do not use negation of LONG_MIN, which
  is negative too.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57800 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-07 09:13:41 +00:00
nobu f4d13801b6 string.c: fix integer overflow
* string.c (str_byte_substr): fix another integer overflow which
  can happen only when SHARABLE_MIDDLE_SUBSTRING is enabled.
  [ruby-core:79951] [Bug #13289]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57799 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-07 09:07:57 +00:00
nobu 72f8df158f string.c: fix integer overflow
* string.c (rb_str_subpos): fix integer overflow which can happen
  only when SHARABLE_MIDDLE_SUBSTRING is enabled.  incorpolate
  https://github.com/mruby/mruby/commit/7db0786abdd243ba031e24683f

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57797 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-07 05:48:15 +00:00
stomar 3ca1cbecc6 string.c: [DOC] fix doc formatting for String#==, #===
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57778 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-04 20:08:04 +00:00
stomar a698d99703 string.c: restore documentation for String#<<
* string.c: [DOC] restore documentation for String#<<
  which became undocumented with r56021; fix a typo.
  [ruby-core:79865] [Bug #13268]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57758 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-02 10:31:56 +00:00
normal 4e90dcc9d7 string.c (str_uminus): deduplicate strings
This exposes the rb_fstring internal function to return a
deduped and frozen string when a non-frozen string is given.
This is useful for writing all sorts of record processing key
values maybe stored, but certain keys and values are often
duplicated at a high frequency, so memory savings can
noticeable.

Use cases are many:

* email/NNTP header processing

  There are some standard header keys everybody uses
  (From/To/Cc/Date/Subject/Received/Message-ID/References/In-Reply-To),
  as well as common ones specific to a certain lists:
  (ruby-core has X-Redmine-* headers)
  It is also useful to dedupe values, as most inboxes have
  multiple messages from the same sender, or MUA.

* package management systems -
  things like RubyGems stores identical strings for licenses,
  dependency names, author names/emails, etc

* HTTP headers/trailers -
  standard headers (Host/Accept/Accept-Encoding/User-Agent/...)
  are common, but there are also uncommon ones.
  Values may be deduped, as well, as it is likely a user
  agent will make multiple/parallel requests to the same
  server.

* version control systems -
  this can be useful for deduplicating names of frequent
  committers (like "nobu" :)

  In linux.git and git.git, there are also common
  trailers such as Signed-Off-By/Acked-by/Reviewed-by/Fixes/...
  as well as less common ones.

* audio metadata -

  There are commonly used tags (Artist/Album/Title/Tracknumber),
  but Vorbis comments allows arbitrary key values to be stored.
  Music collections contain songs by the same artist or mutiple
  songs from the same album, so deduplicating values will be
  helpful there, too.

* JSON, YAML, XML, HTML processing

  Certain fields, tags and attributes are commonly used
  across the same and multiple documents

There is no security concern in this being a DoS vector by
causing immortal strings.  The fstring table is not a GC-root
and not walked during the mark phase.  GC-able dynamic symbols
since Ruby 2.2 are handled in the same manner, and that
implementation also relies on the non-immortality of fstrings.

[Feature #13077] [ruby-core:79663]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57698 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-24 01:01:23 +00:00
nobu 7de42daa21 string.c: assertion
* string.c (str_shared_replace): use RUBY_ASSERT for
  pre-condition.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57628 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-14 12:29:56 +00:00
nobu 957e6e4b14 initialize variables
* string.c (rb_str_enumerate_lines): initialize conditionally
  used variable.

* thread.c (rb_fd_no_init): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57625 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-14 07:52:30 +00:00
nobu 959aac29e7 suppress warnings
* string.c (rb_str_enumerate_lines): hint to suppress a
  maybe-uninitialized warning by gcc.

* thread.c (rb_fd_no_init): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57618 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-13 05:44:15 +00:00
normal a6b9b360ce doc: Add example for Symbol#to_s
* string.c: add example for Symbol#to_s.

The docs for Symbol#to_s only include an example for
Symbol#id2name, but not for #to_s which is an alias;
the docs should include examples for both methods.

From: Marcus Stollsteimer <sto.mar@web.de>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57536 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-05 00:22:03 +00:00
normal b0cfa46bce symbol.c (rb_id2str): eliminate branch to set class
Since the fstring table encompasses all strings in the
symbol table, we may reuse the fstring table walk to set
the class and eliminate the branch in rb_id2str.

* string.c (Init_String): use rb_cString immediately after definition
* symbol.c (rb_id2str): eliminate branch to set class

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57521 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-03 23:55:06 +00:00
normal 5c988df0dd string.c (rb_str_tmp_frozen_release): release embedded strings
Handle the embedded case first, since we may have an embedded
duplicate and non-embedded original string.

* string.c (rb_str_tmp_frozen_release): handled embedded strings
* test/ruby/test_io.rb (test_write_no_garbage): new test
  [ruby-core:78898] [Bug #13085]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57471 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-30 21:54:32 +00:00
normal 9c4ba969a5 io.c: recycle garbage on write
* string.c (STR_IS_SHARED_M): new flag to mark shared mulitple times
  (STR_SET_SHARED): set STR_IS_SHARED_M
  (rb_str_tmp_frozen_acquire, rb_str_tmp_frozen_release): new functions
  (str_new_frozen): set/unset STR_IS_SHARED_M as appropriate
* internal.h: declare new functions
* io.c (fwrite_arg, fwrite_do, fwrite_end): new
  (io_fwrite): use new functions

Introduce rb_str_tmp_frozen_acquire and rb_str_tmp_frozen_release
to manage a hidden, frozen string.  Reuse one bit of the embed
length for shared strings as STR_IS_SHARED_M to indicate a string
has been shared multiple times.  In the common case, the string
is only shared once so the object slot can be reclaimed immediately.

minimum results in each 3 measurements. (time and size)

Execution time (sec)
name                            trunk   built
io_copy_stream_write            0.682   0.254
io_copy_stream_write_socket     1.225   0.751

Speedup ratio: compare with the result of `trunk' (greater is better)
name    built
io_copy_stream_write            2.680
io_copy_stream_write_socket     1.630

Memory usage (last size) (B)
name                            trunk           built
io_copy_stream_write            95436800.000    6512640.000
io_copy_stream_write_socket     117628928.000   7127040.000

Memory consuming ratio (size) with the result of `trunk' (greater is better)
name    built
io_copy_stream_write            14.654
io_copy_stream_write_socket     16.505

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57469 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-30 20:40:18 +00:00
shugo d33726b837 string.c: rindex(//) should set $~.
This seems a bug introduced by r520 (1.4.0).  [ruby-core:79110] [Bug #13135]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57374 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-19 08:13:03 +00:00
nobu 803621f6d7 file.c: refine message
* file.c (rb_get_path_check_convert): refine the error message
  when the path name contains null byte.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57336 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-16 02:43:55 +00:00
nobu 9029464175 string.c: replacement and block
* string.c (rb_enc_str_scrub): only one of replacement and block
  is allowed.  [ruby-core:79038] [Bug #13119]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57304 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-11 02:31:02 +00:00
nobu a3aa4da773 string.c: yield invalid part
* string.c (rb_enc_str_scrub): yield the invalid part only with
  ASCII-incompatible.  [ruby-core:79039] [Bug #13120]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57303 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-11 02:18:45 +00:00
nobu c763f0fb9b string.c: block for scrub with ASCII-incompatible
* string.c (rb_enc_str_scrub): honor the given block with
  ASCII-incompatible encoding.  [ruby-core:79039] [Bug #13120]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57302 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-11 01:03:37 +00:00
nobu 10bd48e402 string.c: CRLF in paragraph mode
* string.c (rb_str_enumerate_lines): allow CRLF to separate
  paragraphs.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57185 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-25 23:56:55 +00:00
nobu 091f99b4b9 string.c: consistent paragraph mode with IO
* string.c (rb_str_enumerate_lines): in paragraph mode, do not
  include newlines which separate paragraphs, so that it will be
  consistent with IO#each_line.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57184 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-25 23:50:09 +00:00
nobu d124fa3a35 string.c: suppress a warning
* string.c (rb_str_casecmp_p): [DOC] use Unicode escape form to
  get rid of warning C4819 by Microsoft Visual C++.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57154 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-22 22:16:19 +00:00
rhe 44ba4fd362 string.c: add missing size_t cast
Add size_t cast to avoid signed integer overflow. r56157 ("string.c:
avoid signed integer overflow", 2016-09-13) missed this. Suppresses
UBSan.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57122 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-20 06:53:45 +00:00
nobu 5d6292809f no crypt.h on FreeBSD 12
* string.c (crypt.h): crypt_r() was added in FreeBSD 12.0 but is
  declared in unistd.h.  [ruby-core:78664] [Bug #13038]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-16 05:05:42 +00:00
nobu 75755ef159 fix chomping newline only line
* string.c (chomp_newline): fix chomping newline only line.
  rb_enc_prev_char return NULL if no previous character and must
  not call rb_enc_ascget on it.  a patch by Ary Borenszweig
  <asterite AT gmail.com> at [ruby-core:78666].  [Bug #13037]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-16 01:12:09 +00:00
nobu c95388a58d string.c: fix method name in rdoc [ci skip]
* string.c (rb_str_equal): [DOC] fix fallback method name. the
  peer's == method will be used, not ===.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57056 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-12 07:12:07 +00:00
nobu 6dd5ee752a String#match? and Symbol#match?
* string.c (rb_str_match_m_p): inverse of Regexp#match?.  based on
  the patch by Herwin Weststrate <herwin@snt.utwente.nl>.
  [Fix GH-1483] [Feature #12898]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57053 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-12 02:56:12 +00:00
nobu d95f5bc81a string.c: chomp option
* string.c (rb_str_enumerate_lines): implement chomp option.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56972 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-03 14:18:03 +00:00
duerst dacf977a42 Fix/improve documentation of String/Symbol#casecmp[?]
Fix documentation of String#casecmp? (examples didn't have the '?').
Add an example with non-ASCII characters. Clarify that casecmp,
unlike casecmp?, only does case-insensitivity on A-Z/a-z.
[ci skip]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56926 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-29 10:45:54 +00:00
nobu 07fb750fd0 string.c: use xmalloc
* string.c (rb_str_casemap): use xmalloc simply instead of
  ALLOC_N.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-29 03:06:01 +00:00
nobu 78b0d7ac1c string.c: fix zero-length array
* string.c (mapping_buffer): get rid of zero-length array member,
  which is not a part of C90.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56915 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-28 13:16:00 +00:00
nobu 196e8b4480 string.c: enable rdoc
* string.c (rb_str_casecmp_p): [DOC] move forward declaration of
  rb_str_downcase to enable rdoc.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56913 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-28 09:37:19 +00:00
duerst ad619e02c4 implement String/Symbol#casecmp? including Unicode case folding
* string.c: Implement String#casecmp? and Symbol#casecmp? by using
  String#downcase :fold for Unicode case folding. This does not include
  options such as :turkic, because these currently cannot be combined
  with the :fold option. This implements feature #12786.

* test/ruby/test_string.rb/test_symbol.rb: Tests for above.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56912 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-28 08:37:32 +00:00
nobu a2144bd72a chomp option
* io.c (extract_getline_opts): extract chomp option.
  [Feature #12553]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56581 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-05 07:28:09 +00:00
nobu 4e44f6ef86 [DOC] replace Fixnum with Integer [ci skip]
* numeric.c: [DOC] update document for Integer class.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56492 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-26 06:11:23 +00:00
nobu 3ba353fc1a Fixed typo [ci skip]
* string.c (rb_str_sub, rb_str_gsub): [DOC] 'backlash' should read
  'backslash'.  [Fix GH-1461]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56460 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-21 02:34:19 +00:00
usa c2dd2d268e * internal.h (ST2FIX): new macro to convert st_index_t to Fixnum.
a hash value of Object might be Bignum, but it causes many troubles
  expecially the Object is used as a key of a hash.  so I've gave up
  to do so.

* array.c (rb_ary_hash): use above macro.

* bignum.c (rb_big_hash): ditto.

* hash.c (rb_obj_hash, rb_hash_hash): ditto.

* numeric.c (rb_dbl_hash): ditto.

* proc.c (proc_hash): ditto.

* re.c (rb_reg_hash, match_hash): ditto.

* string.c (rb_str_hash_m): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56340 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-04 16:25:01 +00:00
nobu 63d77c2a1b string.c: negative hash
* string.c (rb_str_hash_m): hash values may be negative.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56321 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-01 22:51:23 +00:00
usa 7a44019031 * string.c (rb_str_hash_m): st_index_t is not guaranteed as the same
size with int, and of course also not guaranteed the value can be
  Fixnum.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56320 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-01 17:06:21 +00:00
nobu 8d501ec021 string.c: fast path of lstrip_offset
* string.c (lstrip_offset): add a fast path in the case of single
  byte optimizable strings, as well as rstrip_offset.
  [ruby-core:77392] [Feature #12788]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56250 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-26 05:10:56 +00:00
rhe 537fea9921 string.c: fix integer overflow in enc_strlen() and rb_enc_strlen_cr()
* string.c (enc_strlen, rb_enc_strlen_cr): Avoid signed integer
  overflow. The result type of a pointer subtraction may have the same
  size as long. This fixes String#size returning an negative value on
  i686-linux environment:

    str = "\x00" * ((1<<31)-2))
    str.slice!(-3, 3)
    str.force_encoding("UTF-32BE")
    str << 1234
    p str.size

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56247 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-26 02:09:50 +00:00
shyouhei 2fc5210f31 * internal.h (WARN_UNUSED_RESULT): moved to configure.in, to
actually check its availability rather to check GCC's version.

	* configure.in (WARN_UNUSED_RESULT): moved to here.

	* configure.in (RUBY_FUNC_ATTRIBUTE): change function declaration
	  to return int rather than void, because it makes no sense for a
	  warn_unused_result attributed function to return void.

	  Funny thing however is that it also makes no sense for noreturn
	  attributed function to return int.  So there is a fundamental
	  conflict between them.  While I tested this, I confirmed both
	  GCC 6 and Clang 3.8 prefers int over void to correctly detect
	  necessary attributes under this setup.  Maybe subject to change
	  in future.

	* internal.h (UNINITIALIZED_VAR): renamed to MAYBE_UNUSED, then
	  moved to configure.in for the same reason we move
	  WARN_UNUSED_RESULT.

	* configure.in (MAYBE_UNUSED): moved to here.

	* internal.h (__has_attribute): deleted, because it has no use now.

	* string.c (rb_str_enumerate_lines): refactor macro rename.

	* string.c (rb_str_enumerate_bytes): ditto.

	* string.c (rb_str_enumerate_chars): ditto.

	* string.c (rb_str_enumerate_codepoints): ditto.

	* thread.c (do_select): ditto.

	* vm_backtrace.c (rb_debug_inspector_open): ditto.

	* vsnprintf.c (BSD_vfprintf): ditto.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56169 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-16 06:15:55 +00:00
rhe 00fcd967d9 string.c: avoid signed integer overflow
The behavior on signed integer overflow is undefined. On platform with
sizeof(long)==4, it's fairly easy that 'len + termlen' overflows, where
len is the string length and termlen is the terminator length.

So, prevent the integer overflow by avoiding adding to a string length,
or casting to size_t before adding where the total size is passed to
{RE,}ALLOC*().

* string.c (STR_HEAP_SIZE, RESIZE_CAPA_TERM, str_new0, rb_str_buf_new,
  str_shared_replace, rb_str_init, str_make_independent_expand,
  rb_str_resize): Avoid overflow by casting the length to size_t. size_t
  should be able to represent LONG_MAX+termlen.

* string.c (rb_str_modify_expand): Check that the new length is in the
  range of long before resizing. Also refactor to use RESIZE_CAPA_TERM
  macro.

* string.c (str_buf_cat): Fix so that it does not create a negative
  length String. Also fix the condition for 'string sizes too big', the
  total length can be up to LONG_MAX.

* string.c (rb_str_plus): Check the resulting String length does not
  exceed LONG_MAX.

* string.c (rb_str_dump): Fix integer overflow. The dump result will be
  longer then the original String.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56157 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-13 12:33:16 +00:00
rhe eaba77154f string.c: rename STR_EMBEDABLE_P to STR_EMBEDDABLE_P
* string.c (STR_EMBEDDABLE_P): Renamed from STR_EMBEDABLE_P(). And use
  it in more places.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56155 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-13 12:28:54 +00:00
nobu 2608f7d9c5 string.c: STR_EMBEDABLE_P
* string.c (STR_EMBEDABLE_P): extract the predicate macro to tell
  if the given length is capable in an embedded string, and fix
  possible integer overflow.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56151 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-13 12:11:57 +00:00
nobu 8a64787632 string.c: fix integer overflow
* string.c (rb_str_change_terminator_length): fix integer overflow
  in the case growing the terminator length and the string length
  is around LONG_MAX.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56149 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-13 08:12:54 +00:00
rhe be3baa4380 string.c: fix buffer overflow check condition in rb_str_set_len()
* string.c (rb_str_set_len): The buffer overflow check is wrong. The
  space for termlen is allocated outside the capacity returned by
  rb_str_capacity(). This fixes r41920 ("string.c: multi-byte
  terminator", 2013-07-11).  [ruby-core:77257] [Bug #12757]

* test/-ext-/string/test_set_len.rb (test_capacity_equals_to_new_size):
  Test for this change. Applying only the test will trigger [BUG].

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56148 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-13 07:08:15 +00:00
akr 577de1e93d replace fixnum by integer in documents.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56102 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-08 04:57:49 +00:00
nobu 9387ff7315 multiple arguments
* array.c (rb_ary_concat_multi): take multiple arguments.  based
  on the patch by Satoru Horie.  [Feature #12333]
* string.c (rb_str_concat_multi, rb_str_prepend_multi): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56021 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-08-27 01:26:17 +00:00
nobu c2bf7e6f7d string.c: rb_fs_setter
* string.c (rb_fs_setter): check and convert $; value at
  assignment.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55990 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-08-23 01:15:04 +00:00
nobu bd6fe32691 string.c: $; name in error message
* string.c (rb_str_split_m): show $; name in error message when it
  is a wrong object.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55986 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-08-22 17:10:00 +00:00
duerst 31040a307e * string.c (String#downcase), NEWS: Mentioned that case mapping for all
of ISO-8859-1~16 is now supported. [ci skip]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55777 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-30 03:13:28 +00:00
nobu c463366dfd rb_funcallv
* *.c: rename rb_funcall2 to rb_funcallv, except for extensions
  which are/will be/may be gems.  [Fix GH-1406]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55773 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-29 11:57:14 +00:00
ko1 9f60791a04 * vm_core.h: revisit the structure of frame, block and env.
[Bug #12628]

  This patch introduce many changes.

  * Introduce concept of "Block Handler (BH)" to represent
    passed blocks.

  * move rb_control_frame_t::flag to ep[0] (as a special local
    variable). This flags represents not only frame type, but also
    env flags such as escaped.

  * rename `rb_block_t` to `struct rb_block`.

  * Make Proc, Binding and RubyVM::Env objects wb-protected.

  Check [Bug #12628] for more details.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55766 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-28 11:02:30 +00:00
nobu a325876ad3 Fix Issues reported by PVS-Studio static analyzer
* vm.c (vm_set_main_stack): remove unnecessary check.  toplevel
  binding must be initialized.  [Bug #12611] (N1)
* win32/win32.c (w32_symlink): fix return type.  [Bug #12611] (N3)
* string.c (rb_str_split_m): simplify the condition.
  [Bug #12611](N4)

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55729 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-22 10:55:22 +00:00
duerst c6692d9410 * string.c (String#dump): Change escaping of non-ASCII characters in
UTF-8 to use upper-case four-digit hexadecimal escapes without braces
  where possible [Feature #12419].
* test/ruby/test_string.rb (test_dump): Add tests for above.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55728 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-22 08:13:38 +00:00
ngoto 20c4461d86 * string.c (str_buf_cat): Fix potential interger overflow of capa.
In addition, termlen is used instead of +1.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55692 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-15 13:08:54 +00:00
ngoto 2bb292fccf * string.c (str_buf_cat): Fix capa size for embed string.
Fix bug in r55547. [Bug #12536]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55691 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-15 12:35:52 +00:00
normal ed5401a696 string.c: reduce malloc overhead for default buffer size
* string.c (STR_BUF_MIN_SIZE): reduce from 128 to 127
  [ruby-core:76371] [Feature #12025]
* string.c (rb_str_buf_new): adjust for above reduction

From Jeremy Evans <code@jeremyevans.net>:

This changes the minimum buffer size for string buffers from 128 to
127.  The underlying C buffer is always 1 more than the ruby buffer,
so this changes the actual amount of memory used for the minimum
string buffer from 129 to 128.  This makes it much easier on the
malloc implementation, as evidenced by the following code (note that
time -l is used here, but Linux systems may need time -v).

$ cat bench_mem.rb
i = ARGV.first.to_i
Array.new(1000000){" " * i}
$ /usr/bin/time -l ruby bench_mem.rb 128
        3.10 real         2.19 user         0.46 sys
    289080  maximum resident set size
     72673  minor page faults
        13  block output operations
        29  voluntary context switches
$ /usr/bin/time -l ruby bench_mem.rb 127
        2.64 real         2.09 user         0.27 sys
    162720  maximum resident set size
     40966  minor page faults
         2  block output operations
         4  voluntary context switches

To try to ensure a power-of-2 growth, when a ruby string capacity
needs to be increased, after doubling the capacity, add one.  This
ensures the ruby capacity will be odd, which means actual amount
of memory used will be even, which is probably better than the
current case of the ruby capacity being even and the actual amount
of memory used being odd.

A very similar patch was proposed 4 years ago in feature #5875. It
ended up being rejected, because no performance increase was shown.
One reason for that is that ruby does not use STR_BUF_MIN_SIZE
unless rb_str_buf_new is called, and that previously did not have
a ruby API, only a C API, so unless you were using a C extension
that called it, there would be no performance increase.

With the recently proposed feature #12024, String.buffer is added,
which is a ruby API for creating string buffers.  Using
String.buffer(100) wastes much less memory with this patch, as the
malloc implementation can more easily deal with the power-of-2
sized memory usage.  As measured above, memory usage is 44% less,
and performance is 17% better.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55686 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-14 23:30:29 +00:00
ngoto 5eff15d1bd * string.c (rb_str_change_terminator_length): New function to change
termlen and resize heap for the terminator. This is split from
  rb_str_fill_terminator (str_fill_term) because filling terminator
  and changing terminator length are different things. [Bug #12536]

* internal.h: declaration for rb_str_change_terminator_length.

* string.c (str_fill_term): Simplify only to zero-fill the terminator.
  For non-shared strings, it assumes that (capa + termlen) bytes of
  heap is allocated. This partially reverts r55557.

* encoding.c (rb_enc_associate_index): rb_str_change_terminator_length
  is used, and it should be called whenever the termlen is changed.

* string.c (str_capacity): New static function to return capacity
  of a string with the given termlen, because the termlen may
  sometimes be different from TERM_LEN(str) especially during
  changing termlen or filling terminator with specific termlen.

* string.c (rb_str_capacity): Use str_capacity.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-05 10:45:23 +00:00
ngoto 3418a277d8 * string.c: Partially reverts r55547 and r55555.
ChangeLog about the reverted changes are also deleted in this file.
  [Bug #12536] [ruby-dev:49699] [ruby-dev:49702]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55559 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01 18:11:11 +00:00
ngoto 61f2ee0d90 * string.c (str_fill_term): When termlen increases, re-allocation
of memory for termlen should always be needed.
  In this fix, if possible, decrease capa instead of realloc.
  [Bug #12536] [ruby-dev:49699]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55557 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01 17:32:21 +00:00
ngoto a92a537bf4 * string.c: Specify termlen as far as possible.
Additional fix for [Bug #12536] [ruby-dev:49699].

* string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen
  which is apparently 1 for the encodings.

* string.c (str_new0_cstr): New static function to create a String
  object from a C string with specifying termlen.

* string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify
  termlen by using new str_new0_cstr().

* string.c (str_new_static): Specify termlen from the given encoding
  when creating a new String object is needed.

* string.c (rb_tainted_str_new_with_enc): New function to create a
  tainted String object with the given encoding. This means that
  the termlen is correctly specified. Curretly static function.
  The function name might be renamed to rb_tainted_enc_str_new
  or rb_enc_tainted_str_new.

* string.c (rb_external_str_new_with_enc): Use encoding by using the
  above rb_tainted_str_new_with_enc().


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55555 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01 11:24:11 +00:00
ngoto 10e28726a1 * string.c (rb_str_subseq, str_substr): When RSTRING_EMBED_LEN_MAX
is used, TERM_LEN(str) should be considered with it because
  embedded strings are also processed by TERM_FILL.
  Additional fix for [Bug #12536] [ruby-dev:49699].


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55552 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01 04:50:38 +00:00
ngoto 6734a0c3d9 string.c: Add parentheses to avoid C source code ambiguity. [Bug #12536]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55551 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-01 03:58:51 +00:00
ngoto f2ee22371b * string.c: Fix memory corruptions when using UTF-16/32 strings.
[Bug #12536] [ruby-dev:49699]

* string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
  the same as largest value of rb_enc_mbminlen(enc) among encodings.

* string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
  +TERM_LEN_MAX bytes instead of +1. This change may increase memory
  usage.

* string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".

* string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
  of termlen.

* string.c (str_shared_replace): Copy +termlen bytes instead of +1.

* string.c (rb_str_times): termlen should not be included in capa.

* string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
  termlen should be counted with it because embedded strings are
  also processed by TERM_FILL.

* string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.

* string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55547 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-30 10:20:23 +00:00
nobu bcf0a198f1 CASEMAP_DEBUG [ci skip]
* string.c (rb_str_casemap, rb_str_ascii_casemap): move
  debug/tuning messages under a preprocessor condition,
  CASEMAP_DEBUG.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55483 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-21 08:19:59 +00:00
nobu 3a6bb56029 Fix garbage allocation
* string.c (rb_str_casemap): do not put code with side effects
  inside RSTRING_PTR() macro which evaluates the argument multiple
  times.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55481 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-21 07:38:16 +00:00
naruse 8272729977 * string.c (rb_str_casemap): fix memory leak.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55480 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-21 07:14:05 +00:00
naruse 9d291c82e5 * string.c (rb_str_casemap): int is too small for string size.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55479 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-21 07:14:04 +00:00
nobu 1cbc622ea7 string.c: adjust buffer size
* string.c (tr_trans): adjust buffer size by processed and rest
  lengths, instead of doubling repeatedly.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55428 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-16 03:17:54 +00:00
nobu cc9f1e9195 string.c: fix terminator
* string.c (tr_trans): consider terminator length and fix heap
  overflow.  reported by Guido Vranken <guido AT guidovranken.nl>.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55427 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-16 02:15:27 +00:00
nobu aaf8c09900 Fix typo in string.c [ci skip]
* string.c (rb_str_oct): [DOC] fix typo, hornored -> honored.
  [Fix GH-1379]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55378 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-11 06:02:46 +00:00
duerst 02f7ad6237 * enc/iso_8859_1.c: Implement non-ASCII case mapping.
* test/ruby/enc/test_case_comprehensive.rb: Tests for above.
* string.c: Add iso-8859-1 to supported encodings.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55373 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-11 00:46:21 +00:00
duerst 10174c295b * string.c: Special-case :ascii option in rb_str_capitalize_bang and
rb_str_swapcase_bang.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55361 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-10 08:35:17 +00:00
duerst 13f576d6b9 * string.c: Special-case :ascii option in rb_str_upcase_bang (retry).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55359 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-10 08:12:28 +00:00
nobu 2667d1b38f hash.c: ensure NUL-terminated for ENV
* hash.c (get_env_cstr): ensure NUL-terminated.
  [ruby-dev:49655] [Bug #12475]
* string.c (rb_str_fill_terminator): return the pointer to the
  NUL-terminated content.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55345 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-10 05:48:38 +00:00
kazu 075cf3d2e8 string.c (rb_str_ascii_casemap): fix compile error.
error: implicit conversion loses integer precision: 'long' to 'int' [-Werror,-Wshorten-64-to-32]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55332 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-08 14:11:17 +00:00
duerst 872f9a498f * string.c: Revert previous commit (possibility of endless loop).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55331 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-08 13:22:28 +00:00
duerst 5eb73eeda8 * string.c: Special-case :ascii option in rb_str_upcase_bang.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55330 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-08 12:57:44 +00:00
duerst f0fc6ec872 * string.c: New static function rb_str_ascii_casemap; special-casing
:ascii option in rb_str_upcase_bang and rb_str_downcase_bang.
* regenc.c: Fix a bug (wrong use of unnecessary slack at end of string).
* regenc.h -> include/ruby/oniguruma.h: Move declaration of
  onigenc_ascii_only_case_map so that it is visible in string.c.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55329 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-08 12:28:42 +00:00
duerst 8743f010c6 * string.c (rb_str_upcase_bang, rb_str_capitalize_bang,
rb_str_swapcase_bang): Switch to use primitive.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55310 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-07 08:18:42 +00:00
duerst 53a3e3ddd9 * string.c (rb_str_downcase_bang): Switch to use primitive except if
conversion can be done ASCII-only.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55308 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-07 07:44:19 +00:00
duerst ab5f23f26c * string.c: Added UTF-16BE/LE and UTF-32BE/LE to supported encodings
for Unicode case mapping.
* test/ruby/enc/test_case_comprehensive.rb: Tests for above
  functionality; fixed an encoding issue in assertion error message.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55296 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-06 09:36:36 +00:00
duerst 2f49aa8f62 * string.c Change rb_str_casemap to use encoding primitive
case_map instead of directly calling onigenc_unicode_case_map.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55293 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-06 04:37:10 +00:00
duerst c5ea268264 * string.c: Remove :lithuanian guard for Unicode case mapping.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55277 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-05 05:46:37 +00:00
nobu 40c3c3ec6c crypt.h: remove initialized
* missing/crypt.h (struct crypt_data): remove unnecessary member
  "initialized".
* missing/crypt.c (des_setkey_r): nothing to be initialized in
  crypt_data.
* configure.in (struct crypt_data): check for "initialized" in
  struct crypt_data, which may be only in glibc, and isn't on AIX
  at least.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55272 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-04 01:54:54 +00:00
duerst 3dd98b2446 * string.c: Raise ArgumentError when invalid string is detected in
case mapping methods.
* enc/unicode.c: Check for invalid string and signal with negative
  length value.
* test/ruby/enc/test_case_mapping.rb: Add tests for above.
* test/ruby/test_m17n_comb.rb: Add a message to clarify test failure.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55253 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-02 01:24:52 +00:00
nobu a94201243e string.c: fallback to crypt_r
* string.c: prefer crypt_r to crypt iff system crypt nor crypt_r
  are not provided.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55250 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-01 13:17:31 +00:00
nobu a8bfa9bdf1 use system crypt
* configure.in: revert r55237.  replace crypt, not crypt_r, and
  check if crypt is broken more.
* missing/crypt.c: move crypt_r.c
* string.c (rb_str_crypt): use crypt_r if provided by the system.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55245 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-01 06:58:21 +00:00
nobu 3c31685e11 use crypt_r
* string.c (rb_str_crypt): use reentrant crypt_r.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55237 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-01 00:48:08 +00:00
naruse e6ff652ce8 Revert r55225
Run test-all before large commit:
"* string.c: Activate full Unicode case mapping for UTF-8 by removing"

This reverts commit 3fb0fcd1e8.
http://rubyci.s3.amazonaws.com/centos5-64/ruby-trunk/log/20160531T013303Z.fail.html.gz

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55226 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-31 02:56:09 +00:00
duerst 3fb0fcd1e8 * string.c: Activate full Unicode case mapping for UTF-8 by removing
the protective check for the presence of an option.
  Update documentation.
* test/ruby/enc/test_case_comprehensive.rb: Adjust tests for above change.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55225 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-31 01:10:06 +00:00
duerst ae4fba3167 * string.c: Document current behavior for other case mapping methods
on String. [ci skip]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55217 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30 12:15:41 +00:00
duerst 85950c5257 * string.c: Document current situation for String#downcase. [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55215 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30 11:00:26 +00:00
nobu 79a85b18cc string.c: return reallocated pointer
* string.c (str_fill_term): return new pointer reallocated by
  filling terminator.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55212 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30 07:20:28 +00:00
nobu 9ac5f9135a string.c: get rid of unnecessary empty string
* string.c (str_substr, rb_str_aref): refactor not to create
  unnecessary empty string.
* string.c (str_byte_substr, str_byte_aref): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55209 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30 05:50:27 +00:00
nobu e3e8cae9be string.c: check in the order
* string.c (rb_str_aref_m, rb_str_byteslice): check arguments in
  the left-to-right order.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55208 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-30 05:41:02 +00:00
nobu 4fad63da01 transcode.c: scrub in the given encoding
* transcode.c (str_transcode0): scrub in the given encoding when
  the source encoding is given, not in the encoding of the
  receiver.  [ruby-core:75732] [Bug #12431]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55181 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-27 08:09:46 +00:00
nobu b493d156de string.c: integer overflow
* string.c (rb_str_modify_expand): check integer overflow.
  [ruby-core:75592] [Bug #12390]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55054 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-18 05:52:40 +00:00
nobu 4a9705d6e3 ruby.h: RB_INTEGER_TYPE_P
* include/ruby/ruby.h (RB_INTEGER_TYPE_P): new macro and
  underlying inline function to check if the object is an
  Integer (Fixnum or Bignum).

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-18 01:17:43 +00:00
naruse 28f5e12c24 * configure.in: check function attirbute const and pure,
and define CONSTFUNC and PUREFUNC if available.
  Note that I don't add those options as default because
  it still shows many false-positive (it seems not to consider
  longjmp).

* vm_eval.c (stack_check): get rb_thread_t* as an argument
  to avoid duplicate call of GET_THREAD().

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54952 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-08 17:44:51 +00:00
yui-knk deca1d8007 * string.c (rb_str_sub): Fix a special match variable name.
[ci skip]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54915 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-05 05:39:35 +00:00
naruse cdef0bc833 * string.c (count_utf8_lead_bytes_with_word): Use __builtin_popcount
only if it can use SSE 4.2 POPCNT whose latency is 3 cycle.

* internal.h (rb_popcount64): use __builtin_popcountll because now
  it is in fast path.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54894 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-03 13:14:30 +00:00
nobu c353ec0c9e string.c: shortcut
* string.c (rb_str_concat): shortcut concatenation to ASCII-8BIT
  as well as US-ASCII.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54882 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-02 03:58:28 +00:00
nobu 321c6df89b string.c: fix doc
* string.c (rb_str_concat): [DOC] fix the indefinite article, for
  replacement from Fixnum to Integer.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54881 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-02 03:53:34 +00:00
nobu 0e3475a6d9 string.c: fix braces
* string.c (search_nonascii): fix braces unmatched by a
  preprocessing condition.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54879 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-02 00:06:04 +00:00
naruse 2fc973796a fix mixed declaration on non UNALIGNED_WORD_ACCESS
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54877 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-01 18:27:41 +00:00
naruse 64837f778a fix for where UNALIGNED_WORD_ACCESS is not allowed
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54867 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-01 14:19:02 +00:00
naruse db2c32778d Use WORDS_BIGENDIAN
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54862 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-01 09:07:14 +00:00
naruse 0f0121fe1a * string.c (search_nonascii): use nlz on big endian environments.
* internal.h (nlz_intpr): defined.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54859 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-04-30 22:32:05 +00:00
naruse 424a706afe More optimization for r54854's search_nonascii
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54857 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-04-30 16:32:36 +00:00
naruse 4cf460a7bb * string.c (search_nonascii): unroll and use ntz
* configure.in (__builtin_ctz): check.

* configure.in (__builtin_ctzll): check.

* internal.h (rb_popcount32): defined for ntz_int32.
  it can use __builtin_popcount but this function is not used on
  GCC environment because it uses __builtin_ctz.
  When another function uses this, using __builtin_popcount
  should be re-considered.

* internal.h (rb_popcount64): ditto.

* internal.h (ntz_int32): defined for ntz_intptr.

* internal.h (ntz_int64): defined for ntz_intptr.

* internal.h (ntz_intptr): defined as ntz for uintptr_t.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54854 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-04-30 15:39:02 +00:00
nobu a491508753 string.c: rb_str_concat_literals
* string.c (rb_str_concat_literals): concatenate literal string
  fragments.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54490 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-04-05 08:15:22 +00:00
nobu 0f32783976 string.c: skip invalid char gap
* string.c (enc_succ_alnum_char): try to skip an invalid character
  gap between GREEK CAPITAL RHO and SIGMA.
  [ruby-core:74478] [Bug #12204]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54210 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-03-21 10:09:33 +00:00
nobu 49a272d728 string.c: Symbol#match
* string.c (sym_match_m): delegate to String#match but not
  String#=~.  [ruby-core:72864] [Bug #11991]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53866 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-18 12:06:20 +00:00
nobu 5a6a502ef9 string.c: fix rb_str_init
* string.c (rb_str_init): fix segfault and memory leak, consider
  wide char encoding terminator.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53855 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-17 11:24:09 +00:00
naruse d092fc5398 Additional fix and tests for r53851
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53854 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-17 10:15:28 +00:00
nobu b6053df008 remove unnecessary declaration so that rdoc works
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53852 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-17 07:37:20 +00:00
naruse 49dee548f4 fix rubyspec error from r53850
http://rubyci.s3.amazonaws.com/tk2-243-31075/ruby-trunk/log/20160217T061402Z.fail.html.gz

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53851 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-17 07:24:13 +00:00
naruse d46e2aea71 * string.c (rb_str_init): introduce String.new(capacity: size)
[Feature #12024]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53850 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-17 03:21:35 +00:00
duerst 2ca7569c6d * string.c, enc/unicode.c: Disassociating ONIGENC_CASE_FOLD flag from
ONIGENC_CASE_DOWNCASE.
(with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53778 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-08 11:44:12 +00:00
nobu 1bea5a6127 string.c: remove magic number
* string.c (rb_str_dump): share same string literal instead of a
  magic number.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53774 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-08 03:44:48 +00:00
nobu 6442f02176 string.c: use encoding index
* string.c (rb_external_str_with_enc, rb_str_concat, rb_str_dump):
  use encoding index as shortcut without rb_encoding.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53773 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-08 03:41:16 +00:00
nobu 94c70c7d72 fstring_enc_new
* string.c (rb_fstring_enc_new, rb_fstring_enc_cstr): functions to
  make fstring with encoding.
* re.c (rb_reg_initialize): make fstring without copying.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53736 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-04 06:35:34 +00:00
naruse 040ce05610 * string.c (str_new_frozen): if the given string is embeddedable
but not embedded, embed a new copied string. [Bug #11946]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53724 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-03 04:52:13 +00:00
naruse 21daa56b2a * re.c: Introduce RREGEXP_PTR.
patch by dbussink.
  partially merge https://github.com/ruby/ruby/pull/497

* include/ruby/ruby.h: ditto.

* gc.c: ditto.

* ext/strscan/strscan.c: ditto.

* parse.y: ditto.

* string.c: ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53715 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-02 04:39:44 +00:00
nobu 439224a590 RUBY_ASSERT
* error.c (rb_assert_failure): assertion with stack dump.
* ruby_assert.h (RUBY_ASSERT): new header for the assertion.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53615 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-22 08:33:55 +00:00
hsbt 4c6713f374 * string.c: fix a typo. [fix GH-1202][ci skip] Patch by @sunboshan
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53571 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-18 02:48:24 +00:00
duerst e580847ce8 * string.c: Any kind of option is now taking the new code path for
upcase/downcase/capitalize/swapcase. :lithuanian can be used for
  testing if no specific option is desired.
* test/ruby/enc/test_case_mapping.rb: Adjusted to above.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53565 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-17 11:40:46 +00:00
duerst 959bbb6f72 * enc/unicode.c: Removed artificial expansion for Turkic,
added hand-coded support for Turkic, fixed logic for swapcase.
* string.c: Made use of new case mapping code possible from upcase,
  capitalize, and swapcase (with :lithuanian as a guard).
* test/ruby/enc/test_case_mapping.rb: Adjusted for above.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53562 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-17 08:42:16 +00:00
duerst c12af76763 * enc/unicode.c: Artificial mapping to test buffer expansion code.
* string.c: Fixed buffer expansion logic.
* test/ruby/enc/test_case_mapping.rb: Tests for above.
(with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53554 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-16 08:24:58 +00:00
hsbt 219467abde * enc/unicode.c: fix implicit conversion error with clang. fixup r53548.
* string.c: ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53552 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-16 01:51:58 +00:00
svn 72fa5a8ee5 * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53549 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-16 01:24:04 +00:00
duerst be897c2507 * string.c, enc/unicode.c: New code path as a preparation for Unicode-wide
case mapping. The code path is currently guarded by the :lithuanian
  option to avoid accidental problems in daily use.
* test/ruby/enc/test_case_mapping.rb: Test for above.
* string.c: function 'check_case_options': fixed logical errors

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53548 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-16 01:24:03 +00:00
duerst 4a5d3572e6 string.c: made a variable name more grammatically correct
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53510 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-12 09:42:07 +00:00