ASCII. [ruby-talk:287225]
* string.c (rb_str_each_line): use rb_enc_is_newline() to gain
performance if the record separator ($/) is not modified.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15163 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(ENCINDEX_UTF_8): renamed from ENCINDEX_UTF8.
(rb_enc_init): use ENC_REGISTER.
* include/ruby/oniguruma.h (OnigEncodingUTF8, ONIG_ENCODING_UTF8):
removed.
* enc/*.c: remove use of &encoding_*; use enc argument instead.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15067 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
ruby_encoding_index to avoid linear search in rb_enc_to_index.
* include/ruby/encoding.h (rb_enc_to_index): macro defined to use
ruby_encoding_index.
* encoding.c (rb_enc_to_index): removed.
(enc_register_at): initialize ruby_encoding_index member.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14931 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (coderange_scan): extracted from rb_enc_str_coderange.
(rb_enc_str_coderange): use coderange_scan.
(rb_str_shared_replace): copy encoding and coderange.
(rb_enc_str_buf_cat): new function for linear complexity string
accumulation with encoding.
(rb_str_sub_bang): don't conflict substituted part and replacement.
(str_gsub): use rb_enc_str_buf_cat.
(rb_str_clear): clear coderange.
* re.c (rb_reg_regsub): use rb_enc_str_buf_cat.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14910 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
string, using gcc's __builtin_constant_p and statement expression.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14888 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
avoid "i".to_i(36) cause 0 under tr_TR locale.
This is newly implemented, not a copy of missing/strtoul.c.
* include/ruby/ruby.h (ruby_strtoul): declared.
(STRTOUL): defined to use ruby_strtoul.
* bignum.c, pack.c, ext/socket/socket.c: use STRTOUL.
* configure.in (strtoul): don't check.
* missing/strtoul.c: removed.
* include/ruby/missing.h (strtoul): removed.
* common.mk (strtoul.o): removed.
* LEGAL (missing/strtoul.c): removed.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14850 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* sprintf.c (rb_enc_sprintf, rb_enc_vsprintf): new functions to format
arguments with encoding.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14806 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* include/ruby/encoding.h (rb_enc_right_char_head): ditto.
* io.c (appendline): does multibyte RS search in the function.
* io.c (prepare_getline_args): RS may be nil.
* io.c (rb_io_getc): should process character based on external
encoding, when transcoding required.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14619 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* load.c (search_required): returns path too if feature is being
loaded. [ruby-dev:32048] [TODO: refactoring]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14586 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
encoding.
* include/ruby/encoding.h (rb_enc_codepoint): macro is replaced as a
declaration.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14524 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
ruby.c, transcode.c: rename rb_ascii_encoding. to
rb_ascii8bit_encoding. rb_ascii_encoding is ambiguous with
ASCII-8BIT and US-ASCII.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14504 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* include/ruby/ruby.h (rb_block_call_func): function to be called back
as block.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14416 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(rb_enc_nth): don't check the return value of rb_enc_mbclen.
(rb_enc_strlen): ditto.
(rb_enc_precise_mbclen): return needmore(1) if e <= p.
(rb_enc_get_ascii): new function for extracting ASCII character.
* include/ruby/encoding.h (rb_enc_get_ascii): declared.
* include/ruby/regex.h (ismbchar): removed.
* re.c (rb_reg_expr_str): use rb_enc_get_ascii.
(unescape_escaped_nonascii): use rb_enc_precise_mbclen to determine
the termination of escaped non-ASCII character.
(unescape_nonascii): use rb_enc_precise_mbclen.
(rb_reg_quote): use rb_enc_get_ascii.
(rb_reg_regsub): use rb_enc_get_ascii.
* string.c (rb_str_reverse) don't check the return value of
rb_enc_mbclen.
(rb_str_split_m): don't call rb_enc_mbclen with e <= p.
* parse.y (is_identchar): use ISASCII.
(parser_ismbchar): removed.
(parser_precise_mbclen): new macro.
(parser_isascii): new macro.
(parser_tokadd_mbchar): use parser_precise_mbclen to check invalid
character precisely.
(parser_tokadd_string): use parser_isascii.
(parser_yylex): ditto.
(is_special_global_name): don't call is_identchar with e <= p.
(rb_enc_symname_p): ditto.
[ruby-dev:32455]
* ext/tk/sample/tkextlib/vu/canvSticker2.rb: remove coding cookie
because the encoding is not UTF-8. [ruby-dev:32475]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14131 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
validation.
* include/ruby/encoding.h (rb_enc_precise_mbclen): declared.
(MBCLEN_CHARFOUND): new macro.
(MBCLEN_INVALID): new macro.
(MBCLEN_NEEDMORE): new macro.
* include/ruby/oniguruma.h (OnigEncodingTypeST): replace mbc_enc_len
by precise_mbc_enc_len.
(ONIGENC_PRECISE_MBC_ENC_LEN): new macro.
(ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND): new macro.
(ONIGENC_CONSTRUCT_MBCLEN_INVALID): new macro.
(ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE): new macro.
(ONIGENC_MBCLEN_CHARFOUND): new macro.
(ONIGENC_MBCLEN_INVALID): new macro.
(ONIGENC_MBCLEN_NEEDMORE): new macro.
(ONIGENC_MBC_ENC_LEN): use ONIGENC_PRECISE_MBC_ENC_LEN.
* enc/euc_jp.c: validation implemented.
* enc/sjis.c: ditto.
* enc/utf8.c: ditto.
* string.c (rb_str_inspect): use rb_enc_precise_mbclen for invalid
encoding.
(rb_str_valid_encoding_p): new method String#valid_encoding?.
* io.c (rb_io_getc): use rb_enc_precise_mbclen.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14119 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (rb_reg_preprocess): new function for dynamic regexp with
\u{} such as Regexp.new("\\u{6666}").
(rb_reg_prepare_re): preprocess regexp for recompiling.
(read_escaped_byte): new function.
(unescape_escaped_nonascii): new function.
(append_utf8): new function.
(unescape_unicode_list): new function.
(unescape_unicode_bmp): new function.
(unescape_nonascii): new function.
(rb_reg_initialize): preprocess regexp.
* pack.c (rb_uv_to_utf8): renamed from uv_to_utf8.
* parse.y (STR_NEW3): take func instead of has8 and hasmb.
(parser_str_new): use default coderange mechanism except for regexp.
(parser_tokadd_utf8): copy regexp source as-is.
(parser_read_escape): UTF-8 stuff removed.
(parser_tokadd_escape): has8bit and hasmb removed.
(parser_tokadd_string): fix 8-bit single byte character with \u.
(parser_parse_string): has8bit and hasmb removed.
(parser_here_document): has8bit and hasmb removed.
(parser_yylex): call parser_tokadd_utf8 instead of read_escape for
UTF-8 character.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
thread. [ruby-talk:281318]
* signal.c (ruby_sig_finalize): do not install SIG_DFL handler if
previous handler is sighandler().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14056 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
rename ENC_CODERANGE_SINGLE to ENC_CODERANGE_7BIT.
rename ENC_CODERANGE_MULTI to ENC_CODERANGE_8BIT.
Because single byte 8bit character, such as Shift_JIS 1byte katakana,
is represented by ENC_CODERANGE_MULTI even if it is not multi byte.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14027 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(rb_enc_str_asciicompat_p): defined.
* re.c (rb_reg_initialize_str): use rb_enc_str_asciionly_p.
(rb_reg_quote): return ascii-8bit string if the argument is
ascii-only to generate encoding generic regexp if possible.
(rb_reg_s_union): fix encoding handling. [ruby-dev:32094]
* string.c (rb_enc_str_asciionly_p): defined.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14013 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(rb_struct_define_without_accessor): add allocator to the arguments.
* range.c (range_alloc): re-introduced using rb_struct_alloc_noinit.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14003 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
instead of socketpair when mode is RDWR.
* io.c (pipe_open): pass &write_fd to rb_w32_pipe_exec().
* io.c (popen_redirect): define only when HAVE_FORK.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13978 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
given. [ruby-core:12697]
* range.c (range_last): returns last n elements if argument is
given.
* array.c (rb_ary_subseq, rb_ary_last): export.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13736 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
parameter to every function members.
* include/ruby/oniguruma.h (OnigEncodingTypeST): add auxiliary
data member to provide user defined data for an encoding.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13674 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (Init_Regexp): new method Regexp#encoding.
* string.c (str_encoding): moved to encoding.c
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13613 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
cache bits.
* include/ruby/encoding.h (ENC_CODERANGE_SET): fixed a bug not to
set chache bits.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13578 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* parse.y (yycompile): use encoding of the source as default.
* ruby.c (proc_options, load_file): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13557 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* encoding.c (rb_enc_find_index): search the encoding which has the
given name and return its index if found, or -1.
* st.c (type_strcasehash): case-insensitive string hash type.
* string.c (rb_str_force_encoding): force encoding of self. this name
comes from [ruby-dev:31894] by Martin Duerst. [ruby-dev:31744]
* include/ruby/encoding.h (rb_enc_find_index, rb_enc_associate_index):
prototyped.
* include/ruby/encoding.h (rb_enc_isctype): direct interface to ctype.
* include/ruby/st.h (st_init_strcasetable): prototyped.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13556 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* marshal.c (struct dump_arg, struct load_arg): added wrappers to mark
data and compat_tbl entries. [ruby-dev:31870]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13528 b2dd03c8-39d4-4d8f-98ff-823fe69b080e