Граф коммитов

1334 Коммитов

Автор SHA1 Сообщение Дата
duerst a5330fa9ea fix accidental reversal of r57997 in r58000
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58009 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-18 01:35:03 +00:00
duerst 67c1197835 clarifiy 'codepoint' in documentation of String#each_codepoint
Make sure it's clear that the returned values are not Unicode codepoints
for encodings other than UTF-8/UTF-16(BE|LE)/UTF-32(BE|LE).

[ci skip] [Bug #13321]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58000 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-17 02:24:53 +00:00
normal 9eb94b4dc1 deduplicate static rb_str_format format strings
Anybody who hits these code paths can hit them again in the
future, so try deduplicating across multiple runs of these
methods to reduce garbage.

* string.c (str_upto_each): fstring on "%.*d"
* strftime.c (rb_strftime_with_timespec): fstring on "%0*d"

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57997 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-17 00:55:55 +00:00
nobu 8c661ba264 string.c: shortcut argument check
* string.c (str_casecmp, str_casecmp_p): split to skip argument
  check when it is a String certainly.

* string.c (sym_casecmp, sym_casecmp_p): shortcut argument checks.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57978 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-15 07:57:11 +00:00
nobu 9fa56026e5 string.c: use rb_check_string_type
* string.c (rb_str_cmp_m): use rb_check_string_type for check and
  conversion, instead of calling the conversion method directly.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57965 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-14 03:42:43 +00:00
stomar e6dec8f92e docs for Symbol#casecmp and Symbol#casecmp?
* string.c: [DOC] improve docs of Symbol#casecmp and Symbol#casecmp?
  according to the similar String methods; fix RDoc markup and typos;
  fix call-seq's for Symbol#{upcase,downcase,capitalize,swapcase}.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57963 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-13 20:20:40 +00:00
nobu bd17e25588 string.c (rb_str_set_len): pathological check
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57961 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-13 11:47:45 +00:00
nobu 16e804117c string.c: $; is a GC-root
* string.c (Init_String): $; must be a GC-root, not to be
  collected.  [ruby-core:79582]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57958 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-13 09:12:05 +00:00
stomar 605b472d2d docs for String#casecmp and String#casecmp?
* string.c: [DOC] specify when String#casecmp and String#casecmp?
  return nil; modify examples to better show difference to <=>;
  fix RDoc markup and typos.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57886 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-11 20:01:55 +00:00
normal e064879e5a string.c (str_uminus): update doc for deduplication
As of r57698, String#-@ can return pre-existing strings.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57813 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-08 21:24:24 +00:00
nobu e7f4d90930 fix paren
* string.c (str_byte_substr): fix misplaced parenthesis at r56155.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57809 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-08 08:19:56 +00:00
kazu d0708e9e2a string.c: [DOC] Fix a typo in String#dump
[Fix GH-1531][ci skip]
Author:    Alex Semyonov <alex@semyonov.us>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57802 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-07 13:04:39 +00:00
nobu d69d98f61a string.c: negation of LONG_MIN
* string.c (rb_str_update): do not use negation of LONG_MIN, which
  is negative too.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57800 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-07 09:13:41 +00:00
nobu f4d13801b6 string.c: fix integer overflow
* string.c (str_byte_substr): fix another integer overflow which
  can happen only when SHARABLE_MIDDLE_SUBSTRING is enabled.
  [ruby-core:79951] [Bug #13289]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57799 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-07 09:07:57 +00:00
nobu 72f8df158f string.c: fix integer overflow
* string.c (rb_str_subpos): fix integer overflow which can happen
  only when SHARABLE_MIDDLE_SUBSTRING is enabled.  incorpolate
  https://github.com/mruby/mruby/commit/7db0786abdd243ba031e24683f

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57797 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-07 05:48:15 +00:00
stomar 3ca1cbecc6 string.c: [DOC] fix doc formatting for String#==, #===
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57778 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-04 20:08:04 +00:00
stomar a698d99703 string.c: restore documentation for String#<<
* string.c: [DOC] restore documentation for String#<<
  which became undocumented with r56021; fix a typo.
  [ruby-core:79865] [Bug #13268]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57758 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-02 10:31:56 +00:00
normal 4e90dcc9d7 string.c (str_uminus): deduplicate strings
This exposes the rb_fstring internal function to return a
deduped and frozen string when a non-frozen string is given.
This is useful for writing all sorts of record processing key
values maybe stored, but certain keys and values are often
duplicated at a high frequency, so memory savings can
noticeable.

Use cases are many:

* email/NNTP header processing

  There are some standard header keys everybody uses
  (From/To/Cc/Date/Subject/Received/Message-ID/References/In-Reply-To),
  as well as common ones specific to a certain lists:
  (ruby-core has X-Redmine-* headers)
  It is also useful to dedupe values, as most inboxes have
  multiple messages from the same sender, or MUA.

* package management systems -
  things like RubyGems stores identical strings for licenses,
  dependency names, author names/emails, etc

* HTTP headers/trailers -
  standard headers (Host/Accept/Accept-Encoding/User-Agent/...)
  are common, but there are also uncommon ones.
  Values may be deduped, as well, as it is likely a user
  agent will make multiple/parallel requests to the same
  server.

* version control systems -
  this can be useful for deduplicating names of frequent
  committers (like "nobu" :)

  In linux.git and git.git, there are also common
  trailers such as Signed-Off-By/Acked-by/Reviewed-by/Fixes/...
  as well as less common ones.

* audio metadata -

  There are commonly used tags (Artist/Album/Title/Tracknumber),
  but Vorbis comments allows arbitrary key values to be stored.
  Music collections contain songs by the same artist or mutiple
  songs from the same album, so deduplicating values will be
  helpful there, too.

* JSON, YAML, XML, HTML processing

  Certain fields, tags and attributes are commonly used
  across the same and multiple documents

There is no security concern in this being a DoS vector by
causing immortal strings.  The fstring table is not a GC-root
and not walked during the mark phase.  GC-able dynamic symbols
since Ruby 2.2 are handled in the same manner, and that
implementation also relies on the non-immortality of fstrings.

[Feature #13077] [ruby-core:79663]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57698 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-24 01:01:23 +00:00
nobu 7de42daa21 string.c: assertion
* string.c (str_shared_replace): use RUBY_ASSERT for
  pre-condition.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57628 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-14 12:29:56 +00:00
nobu 957e6e4b14 initialize variables
* string.c (rb_str_enumerate_lines): initialize conditionally
  used variable.

* thread.c (rb_fd_no_init): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57625 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-14 07:52:30 +00:00
nobu 959aac29e7 suppress warnings
* string.c (rb_str_enumerate_lines): hint to suppress a
  maybe-uninitialized warning by gcc.

* thread.c (rb_fd_no_init): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57618 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-13 05:44:15 +00:00
normal a6b9b360ce doc: Add example for Symbol#to_s
* string.c: add example for Symbol#to_s.

The docs for Symbol#to_s only include an example for
Symbol#id2name, but not for #to_s which is an alias;
the docs should include examples for both methods.

From: Marcus Stollsteimer <sto.mar@web.de>

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57536 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-05 00:22:03 +00:00
normal b0cfa46bce symbol.c (rb_id2str): eliminate branch to set class
Since the fstring table encompasses all strings in the
symbol table, we may reuse the fstring table walk to set
the class and eliminate the branch in rb_id2str.

* string.c (Init_String): use rb_cString immediately after definition
* symbol.c (rb_id2str): eliminate branch to set class

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57521 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-03 23:55:06 +00:00
normal 5c988df0dd string.c (rb_str_tmp_frozen_release): release embedded strings
Handle the embedded case first, since we may have an embedded
duplicate and non-embedded original string.

* string.c (rb_str_tmp_frozen_release): handled embedded strings
* test/ruby/test_io.rb (test_write_no_garbage): new test
  [ruby-core:78898] [Bug #13085]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57471 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-30 21:54:32 +00:00
normal 9c4ba969a5 io.c: recycle garbage on write
* string.c (STR_IS_SHARED_M): new flag to mark shared mulitple times
  (STR_SET_SHARED): set STR_IS_SHARED_M
  (rb_str_tmp_frozen_acquire, rb_str_tmp_frozen_release): new functions
  (str_new_frozen): set/unset STR_IS_SHARED_M as appropriate
* internal.h: declare new functions
* io.c (fwrite_arg, fwrite_do, fwrite_end): new
  (io_fwrite): use new functions

Introduce rb_str_tmp_frozen_acquire and rb_str_tmp_frozen_release
to manage a hidden, frozen string.  Reuse one bit of the embed
length for shared strings as STR_IS_SHARED_M to indicate a string
has been shared multiple times.  In the common case, the string
is only shared once so the object slot can be reclaimed immediately.

minimum results in each 3 measurements. (time and size)

Execution time (sec)
name                            trunk   built
io_copy_stream_write            0.682   0.254
io_copy_stream_write_socket     1.225   0.751

Speedup ratio: compare with the result of `trunk' (greater is better)
name    built
io_copy_stream_write            2.680
io_copy_stream_write_socket     1.630

Memory usage (last size) (B)
name                            trunk           built
io_copy_stream_write            95436800.000    6512640.000
io_copy_stream_write_socket     117628928.000   7127040.000

Memory consuming ratio (size) with the result of `trunk' (greater is better)
name    built
io_copy_stream_write            14.654
io_copy_stream_write_socket     16.505

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57469 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-30 20:40:18 +00:00
shugo d33726b837 string.c: rindex(//) should set $~.
This seems a bug introduced by r520 (1.4.0).  [ruby-core:79110] [Bug #13135]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57374 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-19 08:13:03 +00:00
nobu 803621f6d7 file.c: refine message
* file.c (rb_get_path_check_convert): refine the error message
  when the path name contains null byte.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57336 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-16 02:43:55 +00:00
nobu 9029464175 string.c: replacement and block
* string.c (rb_enc_str_scrub): only one of replacement and block
  is allowed.  [ruby-core:79038] [Bug #13119]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57304 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-11 02:31:02 +00:00
nobu a3aa4da773 string.c: yield invalid part
* string.c (rb_enc_str_scrub): yield the invalid part only with
  ASCII-incompatible.  [ruby-core:79039] [Bug #13120]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57303 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-11 02:18:45 +00:00
nobu c763f0fb9b string.c: block for scrub with ASCII-incompatible
* string.c (rb_enc_str_scrub): honor the given block with
  ASCII-incompatible encoding.  [ruby-core:79039] [Bug #13120]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57302 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-01-11 01:03:37 +00:00
nobu 10bd48e402 string.c: CRLF in paragraph mode
* string.c (rb_str_enumerate_lines): allow CRLF to separate
  paragraphs.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57185 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-25 23:56:55 +00:00
nobu 091f99b4b9 string.c: consistent paragraph mode with IO
* string.c (rb_str_enumerate_lines): in paragraph mode, do not
  include newlines which separate paragraphs, so that it will be
  consistent with IO#each_line.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57184 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-25 23:50:09 +00:00
nobu d124fa3a35 string.c: suppress a warning
* string.c (rb_str_casecmp_p): [DOC] use Unicode escape form to
  get rid of warning C4819 by Microsoft Visual C++.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57154 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-22 22:16:19 +00:00
rhe 44ba4fd362 string.c: add missing size_t cast
Add size_t cast to avoid signed integer overflow. r56157 ("string.c:
avoid signed integer overflow", 2016-09-13) missed this. Suppresses
UBSan.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57122 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-20 06:53:45 +00:00
nobu 5d6292809f no crypt.h on FreeBSD 12
* string.c (crypt.h): crypt_r() was added in FreeBSD 12.0 but is
  declared in unistd.h.  [ruby-core:78664] [Bug #13038]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-16 05:05:42 +00:00
nobu 75755ef159 fix chomping newline only line
* string.c (chomp_newline): fix chomping newline only line.
  rb_enc_prev_char return NULL if no previous character and must
  not call rb_enc_ascget on it.  a patch by Ary Borenszweig
  <asterite AT gmail.com> at [ruby-core:78666].  [Bug #13037]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-16 01:12:09 +00:00
nobu c95388a58d string.c: fix method name in rdoc [ci skip]
* string.c (rb_str_equal): [DOC] fix fallback method name. the
  peer's == method will be used, not ===.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57056 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-12 07:12:07 +00:00
nobu 6dd5ee752a String#match? and Symbol#match?
* string.c (rb_str_match_m_p): inverse of Regexp#match?.  based on
  the patch by Herwin Weststrate <herwin@snt.utwente.nl>.
  [Fix GH-1483] [Feature #12898]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57053 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-12 02:56:12 +00:00
nobu d95f5bc81a string.c: chomp option
* string.c (rb_str_enumerate_lines): implement chomp option.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56972 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-03 14:18:03 +00:00
duerst dacf977a42 Fix/improve documentation of String/Symbol#casecmp[?]
Fix documentation of String#casecmp? (examples didn't have the '?').
Add an example with non-ASCII characters. Clarify that casecmp,
unlike casecmp?, only does case-insensitivity on A-Z/a-z.
[ci skip]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56926 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-29 10:45:54 +00:00
nobu 07fb750fd0 string.c: use xmalloc
* string.c (rb_str_casemap): use xmalloc simply instead of
  ALLOC_N.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-29 03:06:01 +00:00
nobu 78b0d7ac1c string.c: fix zero-length array
* string.c (mapping_buffer): get rid of zero-length array member,
  which is not a part of C90.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56915 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-28 13:16:00 +00:00
nobu 196e8b4480 string.c: enable rdoc
* string.c (rb_str_casecmp_p): [DOC] move forward declaration of
  rb_str_downcase to enable rdoc.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56913 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-28 09:37:19 +00:00
duerst ad619e02c4 implement String/Symbol#casecmp? including Unicode case folding
* string.c: Implement String#casecmp? and Symbol#casecmp? by using
  String#downcase :fold for Unicode case folding. This does not include
  options such as :turkic, because these currently cannot be combined
  with the :fold option. This implements feature #12786.

* test/ruby/test_string.rb/test_symbol.rb: Tests for above.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56912 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-28 08:37:32 +00:00
nobu a2144bd72a chomp option
* io.c (extract_getline_opts): extract chomp option.
  [Feature #12553]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56581 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-05 07:28:09 +00:00
nobu 4e44f6ef86 [DOC] replace Fixnum with Integer [ci skip]
* numeric.c: [DOC] update document for Integer class.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56492 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-26 06:11:23 +00:00
nobu 3ba353fc1a Fixed typo [ci skip]
* string.c (rb_str_sub, rb_str_gsub): [DOC] 'backlash' should read
  'backslash'.  [Fix GH-1461]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56460 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-21 02:34:19 +00:00
usa c2dd2d268e * internal.h (ST2FIX): new macro to convert st_index_t to Fixnum.
a hash value of Object might be Bignum, but it causes many troubles
  expecially the Object is used as a key of a hash.  so I've gave up
  to do so.

* array.c (rb_ary_hash): use above macro.

* bignum.c (rb_big_hash): ditto.

* hash.c (rb_obj_hash, rb_hash_hash): ditto.

* numeric.c (rb_dbl_hash): ditto.

* proc.c (proc_hash): ditto.

* re.c (rb_reg_hash, match_hash): ditto.

* string.c (rb_str_hash_m): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56340 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-04 16:25:01 +00:00
nobu 63d77c2a1b string.c: negative hash
* string.c (rb_str_hash_m): hash values may be negative.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56321 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-01 22:51:23 +00:00
usa 7a44019031 * string.c (rb_str_hash_m): st_index_t is not guaranteed as the same
size with int, and of course also not guaranteed the value can be
  Fixnum.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56320 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-01 17:06:21 +00:00