from rb_econv_stateless_encoding to apply stateless ASCII
incompatible encodings such as UTF-16BE.
* io.c (make_writeconv): use rb_econv_asciicompat_encoding.
* transcode_data.h (rb_transcoder_asciicompat_type_t): renamed from
rb_transcoder_stateful_type_t.
(rb_transcoder): use rb_transcoder_asciicompat_type_t.
* transcode.c: follow the type change.
(asciicompat_encoding_i): renamed from stateless_encoding_i.
(rb_econv_asciicompat_encoding): renamed from
rb_econv_stateless_encoding.
(econv_s_asciicompat_encoding): method renamed.
* tool/transcode-tblgen.rb: follow the type change.
* enc/trans/utf_16_32.trans: follow the type change.
rb_from_UTF_16BE to UTF-8 is asciicompat_decoder.
rb_from_UTF_16LE to UTF-8 is asciicompat_decoder.
rb_from_UTF_32BE to UTF-8 is asciicompat_decoder.
rb_from_UTF_32LE to UTF-8 is asciicompat_decoder.
UTF-8 to rb_to_UTF_16BE is asciicompat_encoder.
UTF-8 to rb_to_UTF_16LE is asciicompat_encoder.
UTF-8 to rb_to_UTF_32BE is asciicompat_encoder.
UTF-8 to rb_to_UTF_32LE is asciicompat_encoder.
* enc/trans/newline.trans: follow the type change. universal newline
decoder is asciicompat_converter.
* enc/trans/escape.trans: follow the type change.
* enc/trans/iso2022.trans: ditto.
* enc/trans/japanese.trans: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19249 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
undefined conversion error between iso-2022-jp and the corresponding
stateless encoding.
* enc/emacs_mule.c: replicate emacs-mule as stateless-iso-2022-jp.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19220 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tool/transcode-tblgen.rb: generate an empty line after str1.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19217 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(fun_so_escape_html_attr): new function.
(escape_html_attr_finish): new function.
(rb_escape_html_attr): use them to quote the converted result.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19173 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
add state field.
(TRANSCODING_STATE): defined.
(rb_transcoder): add fields: state_size, state_init_func,
state_fini_func.
change rb_transcoding* argument to void*.
* transcode.c (transcode_restartable0): use TRANSCODING_STATE for
first arguments of transcoder functions.
(rb_transcoding_open_by_transcoder): initialize state field.
(rb_transcoding_close): finalize state field.
* tool/transcode-tblgen.rb: provide state size/init/fini.
* enc/trans/newline.trans (universal_newline_init): defined.
(fun_so_universal_newline): take void* as a state pointer.
(rb_universal_newline): provide state size/init/fini.
(rb_crlf_newline): ditto.
(rb_cr_newline): ditto.
* enc/trans/iso2022.trans (iso2022jp_init): defined.
(fun_si_iso2022jp_to_eucjp): take void* as a state pointer.
(fun_so_iso2022jp_to_eucjp): ditto.
(fun_so_eucjp_to_iso2022jp): ditto.
(iso2022jp_reset_sequence_size): ditto.
(finish_eucjp_to_iso2022jp): ditto.
(rb_ISO_2022_JP_to_EUC_JP): provide state size/init/fini.
(rb_EUC_JP_to_ISO_2022_JP): ditto.
* enc/trans/utf_16_32.trans (fun_so_from_utf_16be): take void* as a
state pointer.
(fun_so_to_utf_16be): ditto.
(fun_so_from_utf_16le): ditto.
(fun_so_to_utf_16le): ditto.
(fun_so_from_utf_32be): ditto.
(fun_so_to_utf_32be): ditto.
(fun_so_from_utf_32le): ditto.
(fun_so_to_utf_32le): ditto.
(rb_from_UTF_16BE): provide state size/init/fini.
(rb_to_UTF_16BE): ditto.
(rb_from_UTF_16LE): ditto.
(rb_to_UTF_16LE): ditto.
(rb_from_UTF_32BE): ditto.
(rb_to_UTF_32BE): ditto.
(rb_from_UTF_32LE): ditto.
(rb_to_UTF_32LE): ditto.
* enc/trans/japanese.trans (fun_so_eucjp2sjis): take void* as a state
pointer.
(fun_so_sjis2eucjp): ditto.
(rb_eucjp2sjis): provide state size/init/fini.
(rb_sjis2eucjp): provide state size/init/fini.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19096 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
code. use it in rb_transcoder.
* enc/trans/newline.trans: use TRANSCODE_TABLE_INFO.
* enc/trans/iso2022.trans: ditto.
* enc/trans/utf_16_32.trans: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19046 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
eucJP-ms.
* enc/trans/japanese.trans (eucJP-ms): eucJP-ms is the correct
name of the encoding in Ruby.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19021 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* include/ruby/encoding.h (rb_econv_t): new field: flags.
(rb_econv_binmode): declared.
* io.c (io_unread): text mode hack removed.
(NEED_NEWLINE_DECODER): defined.
(NEED_NEWLINE_ENCODER): defined.
(NEED_READCONV): defined.
(NEED_WRITECONV): defined.
(TEXTMODE_NEWLINE_ENCODER): defined for windows.
(make_writeconv): setup converter with TEXTMODE_NEWLINE_ENCODER for
text mode.
(io_fwrite): use NEED_WRITECONV. character code conversion is
disabled if fptr->writeconv_stateless is nil.
(make_readconv): setup converter with
ECONV_UNIVERSAL_NEWLINE_DECODER for text mode.
(read_all): use NEED_READCONV.
(appendline): use NEED_READCONV.
(rb_io_getline_1): use NEED_READCONV.
(io_getc): use NEED_READCONV.
(rb_io_ungetc): use NEED_READCONV.
(rb_io_binmode): OS-level text mode test removed. call
rb_econv_binmode.
(rb_io_binmode_m): call rb_io_binmode_m with write_io as well.
(rb_io_flags_mode): return mode string including "t".
(rb_io_mode_flags): detect "t" for text mode.
(rb_sysopen): always specify O_BINARY.
* transcode.c (rb_econv_open_by_transcoder_entries): initialize flags.
(rb_econv_open): if source and destination encoding is
both empty string, open newline converter. last_tc will be NULL in
this case.
(rb_econv_encoding_to_insert_output): last_tc may be NULL now.
(rb_econv_string): ditto.
(output_replacement_character): ditto.
(transcode_loop): ditto.
(econv_init): ditto.
(econv_inspect): ditto.
(rb_econv_binmode): new function.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18780 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
in_data_start, in_data_end, in_buf_end and last_trans_index.
(rb_econv_output): removed.
(rb_econv_insert_output): declared.
(rb_econv_encoding_to_insert_output): declared.
* enc/trans/newline.trans (rb_universal_newline): stateful_type
changed.
* transcode.c (transcode_restartable0): initialize inchar_start,
tc->recognized_len and next_table at beginning of the loop.
(rb_econv_open_by_transcoder_entries): initialize new fields.
(rb_econv_open): setup last_trans_index.
(trans_sweep): last out_buf_start can be non-NULL now.
(rb_econv_convert): check last out_buf_start and in_buf_start at
first.
(rb_econv_output_with_destination_encoding): removed.
(econv_just_convert): removed.
(rb_econv_output): removed.
(econv_primitive_output): method removed.
(rb_econv_encoding_to_insert_output): new function.
(allocate_converted_string): new function.
(rb_econv_insert_output): new function.
(econv_primitive_insert_output): new method.
(output_replacement_character): use rb_econv_insert_output. unused
arguments removed.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18654 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transcode_data.h (rb_transcoder): add resetsize_func field.
* enc/trans/iso2022.trans (iso2022jp_reset_sequence_size): defined.
(rb_EUC_JP_to_ISO_2022_JP): provede resetsize_func.
* tool/transcode-tblgen.rb: set NULL for resetsize_func.
* transcode.c (rb_econv_output): new function for inserting output.
(output_replacement_character): use rb_econv_output.
(transcode_loop): check return value of
output_replacement_character.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18628 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(rb_cr_newline): new transcoder.
* transcode.c (trans_open_i): one more exra room for input newline
converter.
(rb_trans_open): crlf newline and cr newline implemented.
(Init_transcode): Encoding::Converter::CRLF_NEWLINE and
Encoding::Converter::LF_NEWLINE defined.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18557 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
resetting a state of stateful encoding.
* enc/trans/iso2022.trans (rb_EUC_JP_to_ISO_2022_JP): specify
finish_eucjp_to_iso2022jp for resetstate_func.
* tool/transcode-tblgen.rb: specify NULL for resetstate_func.
* transcode.c (output_replacement_character): call resetstate_func
before appending the replacement character.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18503 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(rb_transcoder): preprocessor and postprocessor field removed.
change arguments of func_ii, func_si, func_io and func_so.
new field "finish_func".
* tool/transcode-tblgen.rb: make FUNii, FUNsi and FUNio
generatable.
* transcode.c (transcoder_lib_table): removed.
(transcoder_table): change structure.
(transcoder_key): removed because the above structure change.
(make_transcoder_entry): new function.
(get_transcoder_entry): ditto.
(rb_register_transcoder): follow the structure change.
(declare_transcoder): ditto.
(transcode_search_path): new function for breadth first search to
find a list of converters.
(transcode_search_path_i): new function.
(transcode_dispatch_cb): ditto.
(transcode_dispatch): use transcode_search_path.
(transcode_loop): follow the argument change.
(str_transcode): preprocessor and postprocessor stuff removed.
* enc/trans/iso2022.erb.c: new file. ISO-2022-JP conversion
re-implemented.
* enc/trans/japanese.erb.c: ISO-2022-JP stuff removed.
nute(23:52:53)% head -40 ChangeLog
Thu Aug 7 23:43:11 2008 Tanaka Akira <akr@fsij.org>
* transcode_data.h (rb_transcoding): new field "stateful".
(rb_transcoder): preprocessor and postprocessor field removed.
change arguments of func_ii, func_si, func_io and func_so.
new field "finish_func".
* tool/transcode-tblgen.rb: make FUNii, FUNsi and FUNio
generatable.
* transcode.c (transcoder_lib_table): removed.
(transcoder_table): change structure.
(transcoder_key): removed because the above structure change.
(make_transcoder_entry): new function.
(get_transcoder_entry): ditto.
(rb_register_transcoder): follow the structure change.
(declare_transcoder): ditto.
(transcode_search_path): new function for breadth first search to
find a list of converters.
(transcode_search_path_i): new function.
(transcode_dispatch_cb): ditto.
(transcode_dispatch): use transcode_search_path.
(transcode_loop): follow the argument change.
(str_transcode): preprocessor and postprocessor stuff removed.
* enc/trans/iso2022.erb.c: new file. ISO-2022-JP conversion
re-implemented.
* enc/trans/japanese.erb.c: ISO-2022-JP stuff removed.
* enc/trans/utf_16_32.erb.c: follow argument change of FUNso.
[ruby-dev:35798]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18419 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
sources.
* enc/trans/{japanese,korean,single_byte,utf_16_32}.c: to be
autogenerated now.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18376 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
because OnigCodePoint will be used as 32bit signed int.
Masking by 0x7FFFFFFF is ok on GB18030;
Minumum 4bytes character is 0x81308130.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16737 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transcode.c (init_transcoder_table): moved to enc/trans/transdb.c.
* enc/depend (enc/encdb.o enc/trans/transdb.o): depend on
corresponding headers.
* common.mk (COMMONOBJS): moved transcode.o from OBJS
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15915 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
cross-compiling.
* ext/extmk.rb, enc/make_encmake.rb, lib/mkmf.rb: need to be 1.8
compatible for cross-compiling.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15616 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
iso_8859_2.c,iso_8859_6.c,iso_8859_7.c,iso_8859_8.c,iso_8859_9.c,
shift_jis.c,windows_1251.c}: add document about encodings.
* enc/cp949.c: divided into new file.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15516 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* lib/mkmf.rb: revert r15443. "\\1#{sep}\\2" is wrong if sep is ended
with "\\".
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15455 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transcode_data.h (rb_transcoding): include pointer to rb_transcoder
and auxiliary data.
* transcode_data.h (rb_transcoder): all callback functions shoud have
their own parameters.
* enc/trans/{japanese,single_byte}.c: constified.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15148 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/trans/utf_16_32.c: new file, currently implementing
UTF-16BE conversions only.
* test/ruby/test_transcode.rb: Added tests for UTF-16BE;
made check_both_ways() use force_encoding differently.
* transcode_data.h, transcode.c: Support for more conversion
functions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15142 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(ENCINDEX_UTF_8): renamed from ENCINDEX_UTF8.
(rb_enc_init): use ENC_REGISTER.
* include/ruby/oniguruma.h (OnigEncodingUTF8, ONIG_ENCODING_UTF8):
removed.
* enc/*.c: remove use of &encoding_*; use enc argument instead.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15067 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
of using fixed index value.
* enc/Makefile.in (encsrcdir): make US-ASCII built-in.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15047 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/ascii.c: Exchanged order of arguments for one ENC_ALIAS
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15031 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/make_encdb.rb (enc_name_list): constified.
* enc/make_encdb.rb (enc_init_db): moved some functions to encoding.c.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15023 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/sjis.c: move to enc/shift_jis.c, to make encoding name equal to
filename for convinience of loading lib.
* enc/shift_jis.c: moved from enc/sjis.c.
* common.mk: follows enc/shift_jis.c.
* enc/Makefile.in: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15014 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* common.mk (encdb.h): pass enc dir from outside to make_encdb.rb.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15010 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* regenc.h (ENC_REPLICATE, ENC_ALIAS): added for defining replica
encoding and encoding alias.
* encoding.c (rb_enc_init): move alias definitions to enc/*.c.
(rb_enc_find_index): search original of replica and alias when no
encoding library.
(rb_enc_name_list, rb_enc_aliases_enc_i, rb_enc_aliases_str_i,
rb_enc_aliases, Encoding.name_list, Encoding.aliases): added.
(Init_Encoding): init encdb.
* enc/ascii.c, enc/us_ascii.c, enc/euc_jp.c, enc/sjis.c:
add replica encoding and encoding alias difinition.
* common.mk (dist-clean-local): add rule for remvoe encdb.h.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15007 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* common.mk: clean golf, conf*, preludes, and so on.
* enc/depend: silent and ignore error for rm.
* enc/Makefile.in: should define prefix and exec_prefix.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14791 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transcode.c (transcode_dispatch): reverted some of the changes
in r14746.
* transcode.c, enc/trans/single_byte.c: Added conversions to/from
US-ASCII and ASCII-8BIT (using data tables).
* enc/trans/single_byte.c: Some spacing/ordering changes due to
automatic data file generation.
* transcode_data.h, transcode.c: Preliminary code for using
micro-conversion functions.
* test/ruby/test_transcode.rb: Added some tests for US-ASCII and
ASCII-8BIT conversions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14766 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
change "illegal" to "invalid" in a context which doesn' t against
a law.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14736 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
compiled output file name explicitly.
* enc/Makefile.in, enc/depend: now makes compiler to put generated
files under directories corresnponding to the each source.
enc/trans supported.
* enc/make_encmake.rb: evaluates depend file before Makefile.in so
that the former can influence to CONFIG.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14573 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* encoding.c (Init_Encoding): ISO-8859-1 is no longer a replica.
* regenc.h (OnigEncodingDefine): names of extension and encoding can
differ.
* enc/Makefile.in: always shared.
* enc/depend (deffile): should not upcase.
* enc/{ascii,euc_jp,sjis,utf8,iso_8859_{1..16}}.c: fix for Init.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14376 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/Makefile.in: became a serb template.
* enc/make_encmake.rb: creates enc.mk from enc/Makefile.in using serb.
* lib/mkmf.rb (relative_from): moved from ext/extmk.rb.
* lib/mkmf.rb ($extmk): true if under to top source directory, not
only ext.
* lib/mkmf.rb (depend_rules): extracted from create_makefile.
* tool/serb.rb (serb): splitted from tool/compile_prelude.rb.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14267 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
validation.
* include/ruby/encoding.h (rb_enc_precise_mbclen): declared.
(MBCLEN_CHARFOUND): new macro.
(MBCLEN_INVALID): new macro.
(MBCLEN_NEEDMORE): new macro.
* include/ruby/oniguruma.h (OnigEncodingTypeST): replace mbc_enc_len
by precise_mbc_enc_len.
(ONIGENC_PRECISE_MBC_ENC_LEN): new macro.
(ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND): new macro.
(ONIGENC_CONSTRUCT_MBCLEN_INVALID): new macro.
(ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE): new macro.
(ONIGENC_MBCLEN_CHARFOUND): new macro.
(ONIGENC_MBCLEN_INVALID): new macro.
(ONIGENC_MBCLEN_NEEDMORE): new macro.
(ONIGENC_MBC_ENC_LEN): use ONIGENC_PRECISE_MBC_ENC_LEN.
* enc/euc_jp.c: validation implemented.
* enc/sjis.c: ditto.
* enc/utf8.c: ditto.
* string.c (rb_str_inspect): use rb_enc_precise_mbclen for invalid
encoding.
(rb_str_valid_encoding_p): new method String#valid_encoding?.
* io.c (rb_io_getc): use rb_enc_precise_mbclen.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14119 b2dd03c8-39d4-4d8f-98ff-823fe69b080e