(single_byte_optimizable): new function to test optimizationability
using single byte string.
(str_strlen): use single_byte_optimizable instead of
is_ascii_string.
(str_nth): rename argument: asc -> singlebyte.
(str_offset): ditto.
(rb_str_substr): use single_byte_optimizable instead of IS_7BIT.
(rb_str_index): ditto.
(rb_str_rindex): ditto.
(rb_str_splice): ditto.
(rb_str_justify): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14813 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* include/ruby/encoding.h (rb_enc_right_char_head): ditto.
* io.c (appendline): does multibyte RS search in the function.
* io.c (prepare_getline_args): RS may be nil.
* io.c (rb_io_getc): should process character based on external
encoding, when transcoding required.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14619 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* test/ruby/test_m17n.rb (TestM17N::test_tr_s): "invalid mbstring
sequence" is not an error to be tested.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14369 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_buf_append): should propagate encoding.
* string.c (rb_str_each_line): ditto.
* test/ruby/test_m17n.rb (TestM17N::test_str_each_line): should
check encoding as well.
* test/ruby/test_m17n.rb (TestM17N::test_str_each_line): empty
array can not propagate encoding; should not check.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14343 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (reg_match_pos): pos adjustment should be based on
characters.
* test/ruby/test_m17n.rb (TestM17N::test_str_insert): test updated
to check negative offset behavior.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14340 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
head of a character.
* string.c (rb_str_chomp_bang): check if match start at the head
of a character.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14334 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
encoding option is specified for regexp literals with \u{}
escapes.
* string.c (rb_str_squeeze_bang): should squeeze multibyte
characters as well.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14275 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
rb_enc_precise_mbclen.
(rb_str_valid_encoding_p): just check coderange is
ENC_CODERANGE_BROKEN or not.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14262 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* sprintf.c (rb_str_format): check encoding based on result, not
the format string.
* string.c (rb_str_upto): add encoding check.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14256 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
performance boost for worst case.
* string.c (str_strlen): direct size if string is 7bit only.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14221 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
is empty.
* string.c (rb_str_justify): associate encoding of original to the
result.
* string.c (rb_str_chomp_bang): need to check encoding of record
separator.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14211 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_sub_bang, str_gsub): should check and copy encoding
to be replaced.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transcode.c (transcoding): added a pointer to function to flush.
* transcode.c (transcode_loop): do not use string internal.
[ruby-dev:32512]
* transcode.c (str_transcode): allow Encoding objects.
* transcode_data.h (BYTE_LOOKUP): use actual struct name.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14176 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(rb_enc_nth): don't check the return value of rb_enc_mbclen.
(rb_enc_strlen): ditto.
(rb_enc_precise_mbclen): return needmore(1) if e <= p.
(rb_enc_get_ascii): new function for extracting ASCII character.
* include/ruby/encoding.h (rb_enc_get_ascii): declared.
* include/ruby/regex.h (ismbchar): removed.
* re.c (rb_reg_expr_str): use rb_enc_get_ascii.
(unescape_escaped_nonascii): use rb_enc_precise_mbclen to determine
the termination of escaped non-ASCII character.
(unescape_nonascii): use rb_enc_precise_mbclen.
(rb_reg_quote): use rb_enc_get_ascii.
(rb_reg_regsub): use rb_enc_get_ascii.
* string.c (rb_str_reverse) don't check the return value of
rb_enc_mbclen.
(rb_str_split_m): don't call rb_enc_mbclen with e <= p.
* parse.y (is_identchar): use ISASCII.
(parser_ismbchar): removed.
(parser_precise_mbclen): new macro.
(parser_isascii): new macro.
(parser_tokadd_mbchar): use parser_precise_mbclen to check invalid
character precisely.
(parser_tokadd_string): use parser_isascii.
(parser_yylex): ditto.
(is_special_global_name): don't call is_identchar with e <= p.
(rb_enc_symname_p): ditto.
[ruby-dev:32455]
* ext/tk/sample/tkextlib/vu/canvSticker2.rb: remove coding cookie
because the encoding is not UTF-8. [ruby-dev:32475]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14131 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
validation.
* include/ruby/encoding.h (rb_enc_precise_mbclen): declared.
(MBCLEN_CHARFOUND): new macro.
(MBCLEN_INVALID): new macro.
(MBCLEN_NEEDMORE): new macro.
* include/ruby/oniguruma.h (OnigEncodingTypeST): replace mbc_enc_len
by precise_mbc_enc_len.
(ONIGENC_PRECISE_MBC_ENC_LEN): new macro.
(ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND): new macro.
(ONIGENC_CONSTRUCT_MBCLEN_INVALID): new macro.
(ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE): new macro.
(ONIGENC_MBCLEN_CHARFOUND): new macro.
(ONIGENC_MBCLEN_INVALID): new macro.
(ONIGENC_MBCLEN_NEEDMORE): new macro.
(ONIGENC_MBC_ENC_LEN): use ONIGENC_PRECISE_MBC_ENC_LEN.
* enc/euc_jp.c: validation implemented.
* enc/sjis.c: ditto.
* enc/utf8.c: ditto.
* string.c (rb_str_inspect): use rb_enc_precise_mbclen for invalid
encoding.
(rb_str_valid_encoding_p): new method String#valid_encoding?.
* io.c (rb_io_getc): use rb_enc_precise_mbclen.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14119 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
rename ENC_CODERANGE_SINGLE to ENC_CODERANGE_7BIT.
rename ENC_CODERANGE_MULTI to ENC_CODERANGE_8BIT.
Because single byte 8bit character, such as Shift_JIS 1byte katakana,
is represented by ENC_CODERANGE_MULTI even if it is not multi byte.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14027 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(rb_enc_str_asciicompat_p): defined.
* re.c (rb_reg_initialize_str): use rb_enc_str_asciionly_p.
(rb_reg_quote): return ascii-8bit string if the argument is
ascii-only to generate encoding generic regexp if possible.
(rb_reg_s_union): fix encoding handling. [ruby-dev:32094]
* string.c (rb_enc_str_asciionly_p): defined.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14013 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
a pointer to a char to avoid SEGV with "\377".tr("a", "b").
on FreeBSD/amd64.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13872 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* encoding.c (rb_enc_alias): check if original name is registered.
* encoding.c (rb_enc_init): register in same order as kcode options in
re.c. added new aliases.
* string.c (rb_str_force_encoding): check if valid encoding name.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13643 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (Init_Regexp): new method Regexp#encoding.
* string.c (str_encoding): moved to encoding.c
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13613 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
cache bits.
* include/ruby/encoding.h (ENC_CODERANGE_SET): fixed a bug not to
set chache bits.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13578 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
of elements from an array. [ruby-list:42671]
* array.c (rb_ary_product): a new method to get all combinations
of elements from two arrays. can be extended to combinations of
n-arrays, e.g. a.product(b,c,d). anyone volunteer?
* array.c (rb_ary_permutation): empty function body to calculate
permutations of array elements. need volunteer.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13568 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* encoding.c (rb_enc_find_index): search the encoding which has the
given name and return its index if found, or -1.
* st.c (type_strcasehash): case-insensitive string hash type.
* string.c (rb_str_force_encoding): force encoding of self. this name
comes from [ruby-dev:31894] by Martin Duerst. [ruby-dev:31744]
* include/ruby/encoding.h (rb_enc_find_index, rb_enc_associate_index):
prototyped.
* include/ruby/encoding.h (rb_enc_isctype): direct interface to ctype.
* include/ruby/st.h (st_init_strcasetable): prototyped.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13556 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
condition statement much shorter, if no else clause is needed.
* string.c (rb_str_match_m): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13475 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
object is encoding capable. [ruby-dev:31780]
* string.c (rb_str_subpat_set): check for if the argument is a String.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13447 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_rindex_m): was confusing character offset and
byte offset.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13295 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
if rs (optional argument) is an empty string. [ruby-dev:31652]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13289 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
object or nil if it's not target-type. this mechanism is used
to convert types in the C implemented methods.
* hash.c (rb_hash_s_try_convert): ditto.
* io.c (rb_io_s_try_convert): ditto.
* re.c (rb_reg_s_try_convert): ditto.
* string.c (rb_str_s_try_convert): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13251 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
with #to_str method, as well as rb_str_index_m. [ruby-core:11692]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@12805 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
free()ing memory. a patch from Yusuke ENDOH <mame AT tsg.ne.jp>.
[ruby-dev:31062]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@12630 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
object.c, string.c, variable.c, vm_macro.def: revert private
instance variable feature, which is postponed until next major
release.
* marshal.c: TYPE_SYMBOL2 removed; MARSHAL_MINOR reverted back to
8th version.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@11813 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
returns a codepoint for the first character in the string.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@11809 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
symbol from a symbol and a class. back-ported from matzruby.
* parse.y (rb_decompose_ivar2): reverse function of
rb_compose_ivar2().
* marshal.c (w_symbol): support class local instance variables.
* marshal.c (r_object0): ditto.
* compile.c (defined_expr): ditto.
* compile.c (iseq_compile_each): ditto.
* insns.def: add two new instructions: getinstancevariable2 and
setinstancevariable2.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@11630 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
before actually modifying the string.
fixed: [ruby-dev:30211] (originally reported by zunda)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@11598 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
rb_str_upcase_bang, rb_str_downcase_bang, rb_str_swapcase_bang):
add RDoc description that case conversion to be effective only
in ASCII region.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@11204 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
string, but not the shared string. fixed: [ruby-core:09152]
* strnig.c (rb_str_new4): keep shared string untainted when orignal
string is tainted. fixed: [ruby-dev:29672]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@11201 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
is given. in other words, lines become an alias to each_line.
[ruby-core:09218]
* string.c (rb_str_each_byte): ditto for bytes in place of lines.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@11186 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
based on the patch from Eric Mahurin <eric_mahurin at yahoo.com>.
[ruby-core:05861]
* array.c (rb_ary_unshift_m): ditto.
* array.c (ary_make_shared): ditto.
* array.c (RESIZE_CAPA): ditto.
* array.c (rb_ary_free): new function to free memory. code moved
from gc.c.
* string.c (rb_str_free): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@11038 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
starts with given prefix.
* string.c (rb_str_endwith): the opposite of String#startwith?.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@10984 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
by a separator. taken from Python 2.5.
* string.c (rb_str_rpartition): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@10974 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (sym_hash): cache hash value in aux.shared if possible.
* gc.c (rb_obj_id): no need to treat symbols specially.
* lib/fileutils.rb (FileUtils::FileUtils): singleton_methods() no
longer return an array of strings, but of symbols.
* lib/delegate.rb (DelegateClass): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@10971 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
level is greater than zero. [ruby-core:08862]
* parse.y (rb_interned_p): new function to check if a string is
already interned.
* string.c (str_to_id): use rb_str_intern().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@10932 b2dd03c8-39d4-4d8f-98ff-823fe69b080e