Граф коммитов

283 Коммитов

Автор SHA1 Сообщение Дата
duerst 08631278ad Web Mar 5 17:43:43 2008 Martin Duerst <duerst@it.aoyama.ac.jp>
* transcode.c (transcode_loop): Adjusted detection of invalid
	  (ill-formed) UTF-8 sequences. Fixing potential security issue, see
	  http://www.unicode.org/versions/Unicode5.1.0/#Notable_Changes.

	* test/ruby/test_transcode.rb: Added two tests for above fix.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15692 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-03-05 08:45:51 +00:00
duerst 6d5ef97a32 Thu Feb 21 17:15:15 2008 Martin Duerst <duerst@it.aoyama.ac.jp>
* transcode.c: Added basic support for passing options to String#encode
	  via a hash. Currently only one option, with one value, is supported:
	  invalid: :ignore (dropping invalid byte sequences instead of
	  producing an error). Option naming is not yet stable!

	* test/ruby/test_transcode.rb: Added a single test for invalid: :ignore
	  option. Not more tests because most data does not yet distinguish
	  between INVALID and UNKNOWN.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15565 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-21 08:42:10 +00:00
naruse 74b254e833 * enc/trans/japanese.c (rb_to_Windows_31J): to 'Windows-31J'.
* common.mk: add rules for transdb.h.

* transcode.c (init_transcoder_table): use transdb.h.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15317 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-29 10:05:39 +00:00
duerst 38321fc0eb Mon Jan 21 19:42:42 2008 Martin Duerst <duerst@it.aoyama.ac.jp>
* transcode.c, enc/trans/utf_16_32.c, test/ruby/test_transcode.rb:
	  added UTF-32BE and UTF-32LE conversions.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15156 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-21 10:41:59 +00:00
nobu 282e828c59 * transcode.c (str_transcode): initialize transcoder in
rb_transcoding.  [ruby-dev:33234]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15153 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-21 05:36:16 +00:00
nobu f5eb90f3c2 * transcode.c (str_transcode): initialize transcoder in
rb_transcoding.  [ruby-dev:33234]

* transcode_data.h (rb_transcoding): transcoder constified.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15152 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-21 05:32:12 +00:00
nobu 463af63468 * transcode.c (transcode_loop, str_transcoding_resize): use unsigned
char.  [ruby-dev:33232]

* transcode_data.h (rb_transcoding, rb_transcoder): removed callback
  parameters.

* enc/trans/japanese.c: ditto.

* enc/trans/utf_16_32.c: parenthesized bit-or operands.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15150 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-21 03:35:05 +00:00
nobu a8969e999a * transcode.c (transcode_dispatch): constified return value.
* transcode_data.h (rb_transcoding): include pointer to rb_transcoder
  and auxiliary data.

* transcode_data.h (rb_transcoder): all callback functions shoud have
  their own parameters.

* enc/trans/{japanese,single_byte}.c: constified.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15148 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-20 21:40:08 +00:00
duerst a9b15a4e0c Sun Jan 20 20:00:20 2008 Martin Duerst <duerst@it.aoyama.ac.jp>
* transcode.c, enc/trans/utf_16_32.c, test/ruby/test_transcode.rb:
	  added UTF-16LE conversions.

	* fixed changelog for last commit



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15144 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-20 11:00:24 +00:00
duerst 3d0c7bea4d Sun Jan 20 15:08:08 2008 Martin Duerst <duerst@it.aoyama.ac.jp>
* enc/trans/utf_16_32.c: new file, currently implementing
	  UTF-16BE conversions only.

	* test/ruby/test_transcode.rb: Added tests for UTF-16BE;
	  made check_both_ways() use force_encoding differently.

	* transcode_data.h, transcode.c: Support for more conversion
	  functions.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15142 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-20 06:12:48 +00:00
nobu ce2652c9d4 * include/ruby/intern.h (rb_str_tmp_new, rb_str_shared_replace):
prototype moved.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-16 03:51:32 +00:00
akr 6cdef2dc7e * $Date$ keyword removed to avoid inclusion of locale dependent
string.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14912 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-06 15:49:38 +00:00
akr 041e829127 * include/ruby/encoding.h (rb_isascii): defined.
(rb_isalnum): ditto.
  (rb_isalpha): ditto.
  (rb_isblank): ditto.
  (rb_iscntrl): ditto.
  (rb_isdigit): ditto.
  (rb_isgraph): ditto.
  (rb_islower): ditto.
  (rb_isprint): ditto.
  (rb_ispunct): ditto.
  (rb_isspace): ditto.
  (rb_isupper): ditto.
  (rb_isxdigit): ditto.
  (rb_tolower): ditto.
  (rb_toupper): ditto.

* include/ruby/st.h (st_strcasecmp): declared.
  (st_strncasecmp): ditto.

* st.c (type_strcasehash): use st_strcasecmp instead of strcasecmp.
  (st_strcasecmp): defined.
  (st_strncasecmp): ditto.

* include/ruby/ruby.h: include include/ruby/encoding.h.
  (ISASCII): use rb_isascii.
  (ISPRINT): use rb_isprint.
  (ISSPACE): use rb_isspace.
  (ISUPPER): use rb_isupper.
  (ISLOWER): use rb_islower.
  (ISALNUM): use rb_isalnum.
  (ISALPHA): use rb_isalpha.
  (ISDIGIT): use rb_isdigit.
  (ISXDIGIT): use rb_isxdigit.
  (TOUPPER): defined.
  (TOLOWER): ditto.
  (STRCASECMP): ditto.
  (STRNCASECMP): ditto.

* dir.c, encoding.c, file.c, hash.c, process.c, ruby.c, time.c,
  transcode.c, ext/readline/readline.c: use locale insensitive
  functions.  [ruby-core:14662]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14829 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-01 12:24:04 +00:00
duerst 793e9423cd Fri Dec 28 01:55:04 2007 Martin Duerst <duerst@it.aoyama.ac.jp>
* transcode.c (transcode_dispatch): reverted some of the changes
          in r14746.

	* transcode.c, enc/trans/single_byte.c: Added conversions to/from
	  US-ASCII and ASCII-8BIT (using data tables).

	* enc/trans/single_byte.c: Some spacing/ordering changes due to
	  automatic data file generation.

	* transcode_data.h, transcode.c: Preliminary code for using
	  micro-conversion functions.

	* test/ruby/test_transcode.rb: Added some tests for US-ASCII and
	  ASCII-8BIT conversions.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14766 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-28 09:26:55 +00:00
nobu 3b83e10790 * transcode.c (transcode_dispatch): allows transcoding from/to
ASCII-8BIT.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14746 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-27 16:55:06 +00:00
akr efd7504d44 * parse.y, transcode_data.h, transcode.c: change "illegal" to
"invalid" in a context which doesn' t against a law.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14735 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-27 08:27:19 +00:00
nobu 3e65d11090 * transcode.c (transcode_dispatch): fix for multistep transcode.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14669 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-25 06:21:35 +00:00
nobu b7db9036be * common.mk (COMMONOBJS): transcode_data_*.c moved under enc/trans.
* transcode_data.h (rb_transcoding, rb_transcoder): prefixed.

* transcode.c (rb_register_transcoder, rb_declare_transcoder): split
  declaration and registration.  [ruby-dev:32704]

* transcode.c (transcode_dispatch): autoload pre-declared transcoder.

* transcode.c (str_transcode): use rb_define_dummy_encoding().

* transcode.c (Init_transcode): initialize transcoder tables.

* enc/trans/single_byte.c, enc/trans/japanese.c: moved from top.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14666 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-25 05:57:04 +00:00
duerst e7ac333ba8 Tue Dec 25 12:32:32 2007 Martin Duerst <duerst@it.aoyama.ac.jp>
* transcode.c: Moving a static counter from inside register_transcoder()
	  and register_functional_transcoder() to outside the functions, renaming
	  from n to next_transcoder_position. Fixes 3) in [ruby-dev:32715].



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14651 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-25 03:33:23 +00:00
naruse be86e3de33 * transocode.c: register_functional_transcoder() added.
(init_transcoder_table(: register ISO-2022-JP.
  (str_transcode): add preprocessor and postprocessor.

* transcode_data_japanese.c: add ISO-2022-JP support.

* transcode_data.h: moved transcoder and transcoding difinition from
  transcode.c.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14607 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-24 13:51:19 +00:00
duerst 9c7718ac6b Mon Dec 24 09:45:45 2007 Martin Duerst <duerst@it.aoyama.ac.jp>
* transcode.c, transcode_data_one_byte.c, transcode_data_japanese.c:
	  added rb_ prefix to external data symbols.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14561 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-24 00:47:21 +00:00
akr 5b809a28f8 * include/ruby/encoding.h, encoding.c, re.c, io.c, parse.y, numeric.c,
ruby.c, transcode.c: rename rb_ascii_encoding. to
  rb_ascii8bit_encoding.  rb_ascii_encoding is ambiguous with 
  ASCII-8BIT and US-ASCII.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14504 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-22 23:47:18 +00:00
duerst 5ad8c5566d Sat Dec 22 15:45:45 2007 Martin Duerst <duerst@it.aoyama.ac.jp>
* transcode_data_one_byte: slightly optimized

	* transcode_data_japanese: new data file for EUC-JP and SHIFT_JIS
	  (not yet optimized; tests to follow; data from
	   http://nkf.sourceforge.jp/ucm/{SJIS|eucJP}-nkf.ucm)

	* common.mk, transcode.c: Adjusted for transcode_data_japanese



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14472 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-22 06:45:55 +00:00
matz d7cc14d436 * encoding.c (rb_ascii_encoding): renamed from previous
rb_default_encoding().

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14443 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 18:55:30 +00:00
nobu d66a188c4a * transcode.c (rb_str_transcode_bang): returns self if no conversion.
[ruby-dev:32662]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14425 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 08:49:08 +00:00
nobu 98da73bc96 * transcode.c (rb_str_transcode_bang, rb_str_transcode): set new
encoding even if no conversion is done because of 7bit only.
  [ruby-dev:32591]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14293 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-18 08:04:26 +00:00
matz 5c4cf9bfdf for undefined conversions.
* transcode_data_iso_8859.c: Changed from character constants
  ('\xC2') to integer contants (0xC2) for shorter files and
  better readability; eliminated duplicated tables; changed
  from -1 offset to actual UNDEF entry (not yet distinguishing
  UNDEF and ILLEGAL correctly).

* test/ruby/test_transcode.rb: added a test for UNDEF conversion.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14251 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-17 01:28:26 +00:00
matz f2b0dba1cf * transcode.c (str_transcode, transcode_dispatch): added two-step
* trancode.c: some minor formatting fixes

* transcode_data.h, transcode_data_iso_8859.c: Shortened
  extremely frequently used macros to shorten file length.

* test/ruby/test_transcode.rb: Fixed name of test class;
  added setup method to ensure all necessary encodings exist;
  split tests into more test methods; added tests; fixed ordering
  of arguments in assert_equal to have expected result first.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14236 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-15 05:42:25 +00:00
nobu a153fdb6ed * transcode.c (transcode_loop): get rid of SEGV at sequence can not be
converted.

* transcode.c (rb_str_transcode_bang): copy encoding.  [ruby-dev:32532]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14191 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-11 04:57:51 +00:00
nobu 6d7999c132 * transcode.c (str_transcode): allow non-registered encodings.
[ruby-dev:32520]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14182 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-10 12:47:55 +00:00
nobu 3a3bda73dd * string.c (rb_str_tmp_new): creates hidden temporary buffer.
* transcode.c (transcoding): added a pointer to function to flush.

* transcode.c (transcode_loop): do not use string internal.
  [ruby-dev:32512]

* transcode.c (str_transcode): allow Encoding objects.

* transcode_data.h (BYTE_LOOKUP): use actual struct name.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14176 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-10 08:46:06 +00:00
nobu 38b92f838f * transcode*.[ch], test/ruby/test_transcode.rb: set properties.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14175 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-10 08:25:01 +00:00
matz 7ded13f54b * transcode.c: new file to provide encoding conversion features.
code contributed by Martin Duerst.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14172 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-10 05:01:47 +00:00