Граф коммитов

162 Коммитов

Автор SHA1 Сообщение Дата
Lars Kanis d403591b34
Add string encoding IBM720 alias CP720 (#3803)
The mapping table is generated from the ICU project:
  https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/ibm-720_P100-1997.ucm

Fixes bug 16233 : https://bugs.ruby-lang.org/issues/16233
2020-11-22 22:23:40 +09:00
Jeremy Evans ddd9704ae9 Encode ' as ' when using encode(xml: :attr)
Fixes [Bug #16922]
2020-07-10 09:34:08 -07:00
卜部昌平 115fec062c more on NULL versus functions.
Function pointers are not void*.  See also
ce4ea956d2
8427fca49b
2020-02-07 14:24:19 +09:00
Nobuyoshi Nakada cc87037f1c
Fixed misspellings
Fixed misspellings reported at [Bug #16437], missed and a new
typo.
2019-12-22 22:49:17 +09:00
Martin Dürst 369ff79394 add encoding conversion from/to CESU-8
Add encoding conversion (transcoding) from UTF-8 to CESU-8
and back. CESU-8 is an encoding similar to UTF-8, but encodes
codepoints above U+FFFF as two surrogates, these surrogates
again being encoded as if they were UTF-8 codepoints. This
preserves the same binary sorting order as in UTF-16. It is
also somewhat similar (although not exactly identical) to an
encoding used internally by Java.

This completes issue #15995.

enc/trans/cesu_8.trans: Add encoding conversion from/to CESU-8
test/ruby/test_transcode.rb: Add tests for above
2019-07-14 10:58:50 +09:00
duerst 3d46d51c45 replace copyrights by explanatory text in data files for GB2312/GB12345 mappings
Replace the copyrights and explanatory texts in the data files used for mapping
GB2312/GB12345 to/from Unicode with short explanatory texts. [Bug #12598] [ci skip]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59715 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-01 10:22:09 +00:00
nobu b24e093296 Update windows-1255 table
* enc/trans/windows-1255-tbl.rb: update mapping from 0xCA to
  U+05BA.  [Feature #12877]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56516 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-28 15:14:32 +00:00
nobu 06711fd1e3 single_byte.trans: dead code
* enc/trans/single_byte.trans (transcode_tblgen_singlebyte):
  remove useless code.  returned value is not used.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56513 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-28 14:18:52 +00:00
svn 5c725ba9fe * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54130 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-03-16 12:42:16 +00:00
naruse 623dde6ce7 * enc/trans/JIS: update Unicode's notice. [Bug #11844]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54129 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-03-16 12:42:15 +00:00
naruse abfc03c6cf follow the change of the name
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53128 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-15 13:11:33 +00:00
duerst 83304b75c1 * enc/ebcdic.h: new dummy encoding EBCDIC-US
* enc/trans/ebcdic.trans: transcodings between EBCDIC-US
  and iso-8859-1 [with code from Andrea Ribuoli]
* test/ruby/test_transcode.rb: tests for above
* tool/transcode_tablegen.rb: additional argument for
  method transcode_tblgen

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53112 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-14 13:11:31 +00:00
nobu f587347c28 euckr-tbl.rb: euro and registered signs
* enc/trans/euckr-tbl.rb (EUCKR_TO_UCS_TBL): add missing euro and
  registered signs.  [ruby-core:64452] [Bug #10149]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47221 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-08-19 13:22:46 +00:00
nobu bd403755d0 remove trailing spaces
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-05-22 10:58:08 +00:00
naruse 0e92ae9636 * enc/trans/utf8_mac-tbl.rb: fix r42789.
Fix conversion table and logic. [ruby-dev:47680]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@42823 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2013-09-04 06:40:39 +00:00
naruse 853c346dde * enc/trans/utf8_mac-tbl.rb: update conversion table to recent OS X.
Previous table is used on Mac OS X 10.1 or prior.
  This table is used on 10.2 or later. [ruby-dev:47680]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@42789 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2013-09-02 23:36:01 +00:00
ktsj 166d8dc2d6 * enc/trans/japanese_euc.trans, test/ruby/test_transcode.rb,
tool/transcode-tblgen.rb: change EUC-JP-2004 to EUC-JIS-2004.
  This is follow up to changes in r41024.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@41035 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2013-06-02 14:36:41 +00:00
yugui 3fa3f9abb9 Supports static linking of extensions and encodings again.
Fixes --with-static-linked-ext.

Patch by Google Inc. [ruby-core:45073].

* Makefile.in (ENCOBJS, EXTOBJS): New variables to specify static
  linked libraries. Also reintroduces extinit.o, introduces encinit.o
  introduces encinit.o

* common.mk: Builds static libraries rather than shared objects if
  specified.

* configure.in (LD): new substitution. 
  Avoids PIE if s

* enc/depend: Supports static linked libraries
  (libencs, libenc, libtrans): New target.

* enc/encinit.c.erb: new template to generate the initialization of
  statically linked encodings.

* enc/make_encmake.rb (--module): new flag to specify whether static
  or dynamic.

* transcode_data.h (TRANS_INIT): New macro to get rid of the name
  collision of encoding initializers and transcoder initializers.

* ext/extmk.rb: Fixes the behavior on $extstatic is true.

* lib/mkmf.rb (clean-static): new target to clean up static linked
  libraries.

* ruby.c (process_options): New initializes statically linked
  encodings here.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@35662 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-05-16 05:39:06 +00:00
usa 756ffef448 * enc/euc_jp.c: added EUC-JP-2004 and its alias EUC-JISX0213.
[ruby-dev:45571] [Feature #6349]
  Requested by Kyouhei Yanagita <yanagi@shakenbu.org>.

* enc/trans/japanese_euc.trans: ditto.

* enc/trans/JIS/JISX0213-[12]%UCS@{BMP,SIP}.src: JIS X 0213:2004 ->
  Unicode mapping table from NetBSD.

* enc/trans/JIS/UCS@{BMP,SIP}%JISX0213-[12].src: Unicode -> JIX X
  0213:2004 mapping table from NetBSD.

* tool/transcode-tblgen.rb: added SIP support.

* test/ruby/test_transcode.rb: tests of above changes.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@35460 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-04-24 11:14:18 +00:00
naruse c1d369b0ab * enc/trans/iso-8859-16-tbl.rb: add ISO-8859-16 converter.
* enc/trans/single_byte.trans: ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@33993 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2011-12-09 10:27:37 +00:00
nobu 2acc71b2d5 * enc/trans/ibm737-tbl.rb: greek code page. fixes #4738
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@31644 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2011-05-19 15:58:09 +00:00
naruse 06911f90ce * enc/trans/emoji_iso2022_kddi.trans: ISO-2022-JP-KDDI doesn't have
CP932 UDA. Another reason is emacs-mule: the implementation of
  stateless-iso-2022-jp doesn't support beyond 94x94 (0x7fxx);
  but CP932 UDA is in 7Fxx-92xx.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@31366 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2011-04-27 02:45:36 +00:00
akr 113de0083e * enc/trans/utf8_mac.trans: parenthesize macro arguments.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30780 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2011-02-04 10:14:53 +00:00
naruse b98ea1505c * enc/trans/big5-hkscs-tbl.rb: Update table as HKSCS-2008.
patched by oCameLo oTnTh [ruby-core:33256]

* enc/big5.c: add alias Big5-HKSCS:2008 to Big5-HKSCS.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29922 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-11-24 16:40:38 +00:00
naruse 38b482be8c * enc/trans/utf_16_32.trans: add the UTF-32 converter.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29895 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-11-24 00:08:04 +00:00
naruse 7f38397b6c * enc/trans/utf_16_32.trans: add a convert from UTF-8 to UTF-16.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29892 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-11-23 20:49:56 +00:00
naruse 3ab82a65d7 * enc/trans/utf_16_32.trans: raise error on unpaired upper
surrogates.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29891 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-11-23 18:23:03 +00:00
naruse 78bee9c26a * enc/utf_16_32.h: add UTF-16 and UTF-32 as a dummy encoding.
* enc/trans/utf_16_32.trans: add a converter from UTF-16 to UTF-8.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29889 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-11-23 16:42:47 +00:00
naruse 5d8a64b1af Add missing tables.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29871 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-11-22 11:35:55 +00:00
naruse 60dfa6b655 * enc/big5.c: split CP950 from Big5.
* enc/big5.c: split CP951 from Big5-HKSCS.

* enc/trans/big5.trans: import conversion table of Big5, Big5-HKSCS,
  CP950, and CP951 from ICU. they need fallback conversions.
  ref [ruby-core:33256]
  http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/

* tool/transcode-tblgen.rb (import_ucm): add to import ucm files.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29869 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-11-22 09:35:08 +00:00
naruse 7b5e9245ac * enc/trans/gbk-tbl.rb: Add euro sign. [ruby-core:33094]
CP936, which is de facto definition of GBK, has it.
  http://msdn.microsoft.com/en-us/goglobal/cc305153.aspx

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29713 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-11-08 00:50:13 +00:00
naruse 78f5b54f1b * enc/trans/utf8_mac.trans (buf_apply): fix for patterns
whose result is 2 bytes. [ruby-core:30751]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@28307 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-06-12 17:13:54 +00:00
naruse f8d97b0026 * enc/iso_2022_jp.h: add CP50220.
* enc/trans/iso2022.trans: add converter for CP50220.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@27860 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-05-17 06:28:16 +00:00
naruse afd64aafd1 * enc/trans/iso2022.trans: CP50221 supports 8bit JIS.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@27149 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-04-01 08:18:38 +00:00
muraken e4d8dc5c46 * bignum.c, node.h, strftime.c, enc/trans/utf8_mac.trans: added explicit casts for supplessing warnings.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@27040 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-03-25 03:08:28 +00:00
akr 49d993729f * tool/transcode-tblgen.rb (transcode_compile_tree): make
valid_encoding mandatory unless from_encoding is registered in
  ValidEncoding.
  (transcode_tbl_only): ditto.
  (transcode_tblgen): ditto.
  (ValidEncoding): new function.

* enc/trans/escape.trans: specify valid_encoding.

* enc/trans/emoji_sjis_docomo.trans: ditto.

* enc/trans/emoji.trans: ditto.

* enc/trans/emoji_iso2022_kddi.trans: ditto.

* enc/trans/big5.trans: ditto.

* enc/trans/emoji_sjis_softbank.trans: ditto.

* enc/trans/emoji_sjis_kddi.trans: ditto.

* enc/trans/chinese.trans: use ValidEncoding() instead of
  ValidEncoding[].


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26995 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-03-21 03:38:58 +00:00
muraken 04d90693dc * enc/trans/emoji.trans: added codepoints leading 0xf4 into nomap_table.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26955 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-03-16 11:18:03 +00:00
akr a73374bb57 * tool/transcode-tblgen.rb (transcode_tblgen): add valid_encoding
optional argument.

* enc/trans/single_byte.trans use valid_encoding argument for
  transcode_tblgen.

* enc/trans/chinese.trans: ditto.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26941 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-03-15 12:25:20 +00:00
akr ff39d22c33 * enc/trans/emoji.trans: fix nomap_table.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26915 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-03-14 06:49:22 +00:00
akr fa37ab769f * tool/transcode-tblgen.rb: reject ambiguous mapping.
* enc/trans/single_byte.trans: remove ambiguous maping such as
  \xD6 -> U+05F2 and \xD6\xC7 -> U+FB1F in Windows-1255


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26912 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-03-13 17:54:43 +00:00
muraken 62f8df2d3c * enc/trans/EMOJI/*.src, enc/trans/emoji*, enc/x-emoji.c, test/ruby/enc/test_emoji.rb, tool/enc-emoji-citrus-gen.rb, tool/enc-emoji4unicode.rb, tool/jisx0208.rb, tool/test/test_jisx0208.rb: new encodings to support emoji charsets, which are used by Japanese mobile phones [ruby-dev:40528]. Thanks Yoji Shidara for a lot of contribution.
* tool/transcode-tblgen.rb: modified for enc-emoji4unicode.rb.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26856 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-03-09 09:15:42 +00:00
naruse 6899b6ff80 * enc/trans/utf8_mac.trans (buf_shift_char): don't see uninitialised
value. [ruby-dev:40233]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26464 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-01-29 00:56:10 +00:00
duerst b32ee85f97 * transcode_data.h, transcode.c, tool/transcode-tblgen.rb: Added
support for new transcoding instruction FUNsio (with Tatsuya Mizuno)

* enc/trans/gb18030.trans: Significantly reduced GB18030 conversion
  table footprint using FUNsio and differences (with Tatsuya Mizuno)

* test/ruby/test_transcode.rb: Minor name fix (from Tatsuya Mizuno)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26065 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-12-10 11:59:12 +00:00
duerst 9998481d4e * enc/trans/gb18030-tbl.rb: Fix omission of C1 region in code table
(from Tatsuya Mizuno)

* test/ruby/test_transcode.rb: Added test for converting full range of
  Unicode codepoints from/to GB18030 (from Tatsuya Mizuno)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@25980 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-12-03 11:29:33 +00:00
akr cc128e3ecf * enc/trans/newline.trans (fun_so_universal_newline): generate \n
after \r\n detection instead of just after \r.
  [ruby-list:45988] [ruby-core:25881] [ruby-core:26788]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@25883 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-11-22 19:15:55 +00:00
duerst e0436c54c2 * enc/big5.c, enc/trans/big5.trans, enc/trans/big5-uao-tbl.rb,
test/ruby/test-transcode.rb: Added Encoding 'Big5-UAO' and transcoding
  for it (from Tatsuya Mizuno) (see Bug #1784)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@25822 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-11-17 08:56:11 +00:00
duerst 2886207584 * enc/trans/big5.trans, big5-hkscs-tbl.rb:
new Chinese BIG5-HKSCS transcoding (with Tatsuya Mizuno)

* test/ruby/test_transcode.rb: added tests for the above
  (with Tatsuya Mizuno)

* enc/big5.c: Added BIG5-HKSCS as a replicate encoding of BIG5
  (short term solution, needs more work; with Tatsuya Mizuno)

* tool/transcode-tblgen.rb: made 'pat' directly accessible in
  class StrSet


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24264 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-07-24 10:26:18 +00:00
naruse d9cf0f822f * enc/trans/utf8_mac.trans: remove wrong optimization.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23686 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-06-13 18:55:55 +00:00
naruse 3abca796f4 Fix: DON'T move in_p because before in_p is replaced by buffered data.
* transcode.c: NOMAP is now multibyte direct map.

* transcode.c: remove ASIS.

* transcode_data.h: ditto.

* tool/transcode-tb (ActionMap#generate_info): remove :asis.

* tool/transcode-tb (ActionMap#generate_info): add :nomap0.

* enc/trans/utf8_mac.trans: replace :asis by :nomap0.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23344 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-05-05 00:05:11 +00:00
naruse f207f9fd51 * enc/trans/utf8_mac-tbl.rb: don't use Unicode escape.
* enc/trans/utf8_mac.trans: follow above.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23325 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-05-02 01:38:27 +00:00