enc/unicode/name2ctype.h, enc/unicode/name2ctype.h.blt,
enc/unicode/name2ctype.kwd, enc/unicode/name2ctype.src:
Add Age property to regexp. [ruby-core:33019]
patched by Ammar Ali, tested by Run Paint Run Run
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29717 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
enc/unicode/name2ctype.h, enc/unicode/name2ctype.h.blt,
enc/unicode/name2ctype.kwd, enc/unicode/name2ctype.src:
Update Oniguruma for Unicode 6.
patched by Run Paint Run Run. [ruby-core:32923] #3989
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29620 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
--
* enc/Makefile.in (realclean): has been missing. necessary
for make realclean-enc.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29177 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/trans/single_byte.trans: remove ambiguous maping such as
\xD6 -> U+05F2 and \xD6\xC7 -> U+FB1F in Windows-1255
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26912 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
support for new transcoding instruction FUNsio (with Tatsuya Mizuno)
* enc/trans/gb18030.trans: Significantly reduced GB18030 conversion
table footprint using FUNsio and differences (with Tatsuya Mizuno)
* test/ruby/test_transcode.rb: Minor name fix (from Tatsuya Mizuno)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26065 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(from Tatsuya Mizuno)
* test/ruby/test_transcode.rb: Added test for converting full range of
Unicode codepoints from/to GB18030 (from Tatsuya Mizuno)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@25980 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
after \r\n detection instead of just after \r.
[ruby-list:45988] [ruby-core:25881] [ruby-core:26788]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@25883 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
test/ruby/test-transcode.rb: Added Encoding 'Big5-UAO' and transcoding
for it (from Tatsuya Mizuno) (see Bug #1784)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@25822 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
enc/unicode/name2ctype.h, enc/unicode/name2ctype.h.blt,
enc/unicode/name2ctype.kwd, enc/unicode/name2ctype.src:
use UTS#18 for POSIX character class.
http://rubyspec.org/issues/show/161
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@25338 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tool/enc-unicode.rb,
enc/unicode/name2ctype.h, enc/unicode/name2ctype.h.blt,
enc/unicode/name2ctype.kwd, enc/unicode/name2ctype.src:
Add DerivedCoreProperties, PropList (Binary Property),
PropertyAlias and PropertyValueAlias.
Now users of tool/enc-unicode.rb should specify
the directory of UCD files.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@25324 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode/name2ctype.h.blt, enc/unicode/name2ctype.kwd,
enc/unicode/name2ctype.src: Updated to Unicode 5.2.0.
NOTE: when you update these data, download UnicodeData.txt
and Scripts.txt from http://www.unicode.org/Public/UNIDATA/
and run
ruby1.9 tool/enc-unicode.rb UnicodeData.txt Scripts.txt \
> enc/unicode/name2ctype.kwd
* enc/unicode/Scripts.txt: removed.
* enc/unicode/UnicodeData.txt: removed.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@25190 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tool/enc-unicode.rb: added for generate name2ctype.kwd.
contributed by Run Paint Run Run [ruby-core:24775]
use like following:
ruby19 tool/enc-unicode.rb enc/unicode/UnicodeData.txt \
enc/unicode/Scripts.txt > enc/unicode/name2ctype.kwd
* enc/unicode.c (CodeRanges): move definitions to name2ctype.h.
* enc/unicode/name2ctype.h.blt, enc/unicode/name2ctype.kwd,
enc/unicode/name2ctype.src: updated to v5.1.
* enc/unicode/UnicodeData.txt, enc/unicode/Scripts.txt: added v5.1.
* Makefile.in: add rule to generate name2ctype.kwd from
UnicodeData.txt and Scripts.txt.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24651 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/big5.c (EncLen_Big5): back to original Big5 table.
(EncLen_Big5_HKSCS): for Big5-HKSCS.
(trans): add the lead byte table for Big5-HKSCS.
(big5_mbc_enc_len): abstract function for Big5 series.
(big5_mbc_enc_len): for Big5.
(big5_hkscs_mbc_enc_len): for Big5-HKSCS.
(BIG5_HKSCS_P): added.
(BIG5_ISMB_FIRST): add routine for Big5-HKSCS.
(big5_hkscs): add for Big5-HKSCS.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24384 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* encoding.c (rb_enc_set_base): Add for setting base encoding
with their names. this is internal function.
* template/encdb.h.tmpl: specify ENC_SET_BASE for second
encodings in each encoding files.
* enc/encdb.c (rb_enc_set_base): add a declaration.
(ENC_SET_BASE): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24383 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
new Chinese BIG5-HKSCS transcoding (with Tatsuya Mizuno)
* test/ruby/test_transcode.rb: added tests for the above
(with Tatsuya Mizuno)
* enc/big5.c: Added BIG5-HKSCS as a replicate encoding of BIG5
(short term solution, needs more work; with Tatsuya Mizuno)
* tool/transcode-tblgen.rb: made 'pat' directly accessible in
class StrSet
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24264 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
new Chinese GBK transcoding (from Yoshihiro Kambayashi)
* test/ruby/test_transcode.rb: added tests for the above
(from Yoshihiro Kambayashi)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21315 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
new Chinese Big5 transcoding (from Yoshihiro Kambayashi)
* test/ruby/test_transcode.rb: added tests for the above
(from Yoshihiro Kambayashi)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21313 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
to add more transcodings (with Yoshihiro Kambayashi)
* enc/trans/iso-8859-1-tbl.rb: new file to avoid having to
treat ISO-8859-1 as special
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@20054 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Tadashi Saito <shiba AT mail2.accsnet.ne.jp> at [ruby-dev:36916]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19929 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
to reduce coupling between table generation script and
specific encodings.
* enc/trans/single_byte.trans: using set_valid_byte_pattern
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19831 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
from transcode_tblgen_windows.
(transcode_tblgen_iso8859): use transcode_tblgen_singlebyte.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19780 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
also returns ssize_t.
* enc/trans/iso2022.trans: follow the type change.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19354 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transcode.c (transcode_restartable0): don't need to cast the result
of output functions.
* enc/trans/newline.trans: follow the type change.
* enc/trans/escape.trans: ditto.
* enc/trans/utf_16_32.trans: ditto.
* enc/trans/iso2022.trans: ditto.
* enc/trans/japanese.trans: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19351 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
from rb_econv_stateless_encoding to apply stateless ASCII
incompatible encodings such as UTF-16BE.
* io.c (make_writeconv): use rb_econv_asciicompat_encoding.
* transcode_data.h (rb_transcoder_asciicompat_type_t): renamed from
rb_transcoder_stateful_type_t.
(rb_transcoder): use rb_transcoder_asciicompat_type_t.
* transcode.c: follow the type change.
(asciicompat_encoding_i): renamed from stateless_encoding_i.
(rb_econv_asciicompat_encoding): renamed from
rb_econv_stateless_encoding.
(econv_s_asciicompat_encoding): method renamed.
* tool/transcode-tblgen.rb: follow the type change.
* enc/trans/utf_16_32.trans: follow the type change.
rb_from_UTF_16BE to UTF-8 is asciicompat_decoder.
rb_from_UTF_16LE to UTF-8 is asciicompat_decoder.
rb_from_UTF_32BE to UTF-8 is asciicompat_decoder.
rb_from_UTF_32LE to UTF-8 is asciicompat_decoder.
UTF-8 to rb_to_UTF_16BE is asciicompat_encoder.
UTF-8 to rb_to_UTF_16LE is asciicompat_encoder.
UTF-8 to rb_to_UTF_32BE is asciicompat_encoder.
UTF-8 to rb_to_UTF_32LE is asciicompat_encoder.
* enc/trans/newline.trans: follow the type change. universal newline
decoder is asciicompat_converter.
* enc/trans/escape.trans: follow the type change.
* enc/trans/iso2022.trans: ditto.
* enc/trans/japanese.trans: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19249 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
undefined conversion error between iso-2022-jp and the corresponding
stateless encoding.
* enc/emacs_mule.c: replicate emacs-mule as stateless-iso-2022-jp.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19220 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tool/transcode-tblgen.rb: generate an empty line after str1.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19217 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(fun_so_escape_html_attr): new function.
(escape_html_attr_finish): new function.
(rb_escape_html_attr): use them to quote the converted result.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19173 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
add state field.
(TRANSCODING_STATE): defined.
(rb_transcoder): add fields: state_size, state_init_func,
state_fini_func.
change rb_transcoding* argument to void*.
* transcode.c (transcode_restartable0): use TRANSCODING_STATE for
first arguments of transcoder functions.
(rb_transcoding_open_by_transcoder): initialize state field.
(rb_transcoding_close): finalize state field.
* tool/transcode-tblgen.rb: provide state size/init/fini.
* enc/trans/newline.trans (universal_newline_init): defined.
(fun_so_universal_newline): take void* as a state pointer.
(rb_universal_newline): provide state size/init/fini.
(rb_crlf_newline): ditto.
(rb_cr_newline): ditto.
* enc/trans/iso2022.trans (iso2022jp_init): defined.
(fun_si_iso2022jp_to_eucjp): take void* as a state pointer.
(fun_so_iso2022jp_to_eucjp): ditto.
(fun_so_eucjp_to_iso2022jp): ditto.
(iso2022jp_reset_sequence_size): ditto.
(finish_eucjp_to_iso2022jp): ditto.
(rb_ISO_2022_JP_to_EUC_JP): provide state size/init/fini.
(rb_EUC_JP_to_ISO_2022_JP): ditto.
* enc/trans/utf_16_32.trans (fun_so_from_utf_16be): take void* as a
state pointer.
(fun_so_to_utf_16be): ditto.
(fun_so_from_utf_16le): ditto.
(fun_so_to_utf_16le): ditto.
(fun_so_from_utf_32be): ditto.
(fun_so_to_utf_32be): ditto.
(fun_so_from_utf_32le): ditto.
(fun_so_to_utf_32le): ditto.
(rb_from_UTF_16BE): provide state size/init/fini.
(rb_to_UTF_16BE): ditto.
(rb_from_UTF_16LE): ditto.
(rb_to_UTF_16LE): ditto.
(rb_from_UTF_32BE): ditto.
(rb_to_UTF_32BE): ditto.
(rb_from_UTF_32LE): ditto.
(rb_to_UTF_32LE): ditto.
* enc/trans/japanese.trans (fun_so_eucjp2sjis): take void* as a state
pointer.
(fun_so_sjis2eucjp): ditto.
(rb_eucjp2sjis): provide state size/init/fini.
(rb_sjis2eucjp): provide state size/init/fini.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19096 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
code. use it in rb_transcoder.
* enc/trans/newline.trans: use TRANSCODE_TABLE_INFO.
* enc/trans/iso2022.trans: ditto.
* enc/trans/utf_16_32.trans: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19046 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
eucJP-ms.
* enc/trans/japanese.trans (eucJP-ms): eucJP-ms is the correct
name of the encoding in Ruby.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19021 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* include/ruby/encoding.h (rb_econv_t): new field: flags.
(rb_econv_binmode): declared.
* io.c (io_unread): text mode hack removed.
(NEED_NEWLINE_DECODER): defined.
(NEED_NEWLINE_ENCODER): defined.
(NEED_READCONV): defined.
(NEED_WRITECONV): defined.
(TEXTMODE_NEWLINE_ENCODER): defined for windows.
(make_writeconv): setup converter with TEXTMODE_NEWLINE_ENCODER for
text mode.
(io_fwrite): use NEED_WRITECONV. character code conversion is
disabled if fptr->writeconv_stateless is nil.
(make_readconv): setup converter with
ECONV_UNIVERSAL_NEWLINE_DECODER for text mode.
(read_all): use NEED_READCONV.
(appendline): use NEED_READCONV.
(rb_io_getline_1): use NEED_READCONV.
(io_getc): use NEED_READCONV.
(rb_io_ungetc): use NEED_READCONV.
(rb_io_binmode): OS-level text mode test removed. call
rb_econv_binmode.
(rb_io_binmode_m): call rb_io_binmode_m with write_io as well.
(rb_io_flags_mode): return mode string including "t".
(rb_io_mode_flags): detect "t" for text mode.
(rb_sysopen): always specify O_BINARY.
* transcode.c (rb_econv_open_by_transcoder_entries): initialize flags.
(rb_econv_open): if source and destination encoding is
both empty string, open newline converter. last_tc will be NULL in
this case.
(rb_econv_encoding_to_insert_output): last_tc may be NULL now.
(rb_econv_string): ditto.
(output_replacement_character): ditto.
(transcode_loop): ditto.
(econv_init): ditto.
(econv_inspect): ditto.
(rb_econv_binmode): new function.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18780 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
in_data_start, in_data_end, in_buf_end and last_trans_index.
(rb_econv_output): removed.
(rb_econv_insert_output): declared.
(rb_econv_encoding_to_insert_output): declared.
* enc/trans/newline.trans (rb_universal_newline): stateful_type
changed.
* transcode.c (transcode_restartable0): initialize inchar_start,
tc->recognized_len and next_table at beginning of the loop.
(rb_econv_open_by_transcoder_entries): initialize new fields.
(rb_econv_open): setup last_trans_index.
(trans_sweep): last out_buf_start can be non-NULL now.
(rb_econv_convert): check last out_buf_start and in_buf_start at
first.
(rb_econv_output_with_destination_encoding): removed.
(econv_just_convert): removed.
(rb_econv_output): removed.
(econv_primitive_output): method removed.
(rb_econv_encoding_to_insert_output): new function.
(allocate_converted_string): new function.
(rb_econv_insert_output): new function.
(econv_primitive_insert_output): new method.
(output_replacement_character): use rb_econv_insert_output. unused
arguments removed.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18654 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transcode_data.h (rb_transcoder): add resetsize_func field.
* enc/trans/iso2022.trans (iso2022jp_reset_sequence_size): defined.
(rb_EUC_JP_to_ISO_2022_JP): provede resetsize_func.
* tool/transcode-tblgen.rb: set NULL for resetsize_func.
* transcode.c (rb_econv_output): new function for inserting output.
(output_replacement_character): use rb_econv_output.
(transcode_loop): check return value of
output_replacement_character.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18628 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(rb_cr_newline): new transcoder.
* transcode.c (trans_open_i): one more exra room for input newline
converter.
(rb_trans_open): crlf newline and cr newline implemented.
(Init_transcode): Encoding::Converter::CRLF_NEWLINE and
Encoding::Converter::LF_NEWLINE defined.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18557 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
resetting a state of stateful encoding.
* enc/trans/iso2022.trans (rb_EUC_JP_to_ISO_2022_JP): specify
finish_eucjp_to_iso2022jp for resetstate_func.
* tool/transcode-tblgen.rb: specify NULL for resetstate_func.
* transcode.c (output_replacement_character): call resetstate_func
before appending the replacement character.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@18503 b2dd03c8-39d4-4d8f-98ff-823fe69b080e