* enc/unicode.c: Remove special processing for U+03B9/U+03BC/U+A64B
(GREEK SMALL LETTERs IOTA/MU, CYRILLIC SMALL LETTER MONOGRAPH UK)
from onigenc_unicode_case_map and simplify code.
* enc/unicode/case-folding.rb: Remove check for U+03B9/U+03BC/U+A64B.
This and the previous few related commits make sure that we won't hit
the equivalent of bug #12990 anymore for future updates of Unicode versions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56976 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode/case-folding.rb: Reorder codepoints so that the upper-case
mapping comes first.
* enc/unicode/9.0.0/casefold.h: Codepoints reordered, upper-case mapping
flag added.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56975 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* meta character \X matches Unicode 9.0.0 characters with some workarounds
for UTR #51 Unicode Emoji, Version 4.0 emoji zwj sequences.
[Feature #12831] [ruby-core:77586]
The term "character" can have many meanings bytes, codepoints, combined
characters, and so on. "grapheme cluster" is highest one of such words,
which means user-perceived characters.
Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION specifies how to
handle grapheme clusters (extended grapheme cluster).
But some specs aren't updated to current situation because Unicode Emoji
is rapidly extended without well definition.
It breaks the precondition of UTR#29 "Grapheme cluster boundaries can be
easily tested by looking at immediately adjacent characters". (the
sentence will be removed in the next version)
Though some of its detail are described in Unicode Technical Report #51
UNICODE EMOJI but it is not merged into UTR#29 yet.
http://unicode.org/reports/tr29/http://unicode.org/reports/tr51/http://unicode.org/Public/emoji/4.0/
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56949 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode.c: Add U+A64B to the special cases 03B9 and 03BC
at the end of onigenc_unicode_case_map (Bug #12990).
* enc/unicode/case-folding.rb: Add U+A64B to the special cases
03B9 and 03BC. Add a comment pointing to enc/unicode.c.
Change warnings to exceptions for unpredicted cases,
because this would have been more easily noticed
(the warning was not noticed when upgrading to Unicode 9.0.0).
* test/ruby/enc/test_case_comprehensive.rb: Remove temporary
exclusion of U+A64B from testing.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56941 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/trans/single_byte.trans (transcode_tblgen_singlebyte):
remove useless code. returned value is not used.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56513 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
removing directories/files related to Unicode version 8.0.0
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56090 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* unicode/9.0.0/casefold.h, name2ctype.h, unicode/data/9.0.0:
new directories/files for Unicode version 9.0.0
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Implement non-ASCII case conversion for ISO-8859-2, by Yushiro Ishii.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55775 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Implement non-ASCII case conversion for Windows-1257, by Sho Koike.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55752 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Implement non-ASCII case conversion for Windows-1250, by Sho Koike.
* ChangeLog: Fixed order of previous two entries.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55751 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Implement non-ASCII case conversion for Windows-1251, by Shunsuke Sato.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55750 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Implement non-ASCII case conversion for Windows-1251, by Shunsuke Sato.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55749 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* common.mk, enc/depend (casefold.h, name2ctype.h): move to
unicode data directory per version.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55701 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* common.mk, enc/Makefile.in: moved timestamp files for
directories under the specific directory, to get rid of match
with files under the source directory.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55696 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* common.mk, enc/Makefile.in: moved timestamp files for
directories under the specific directory, to get rid of match
with files under the source directory.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55693 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode/case-folding.rb, tool/enc-unicode.rb: check if
Unicode versions are consistent with each other.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55687 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Makefile.in (enc/unicode/name2ctype.h): remove stale recipe,
which did not support Unicode age properties.
* common.mk (enc/unicode/name2ctype.h): update by --header option
of tool/enc-unicode.rb. enc/unicode/name2ctype.kwd file has not
been used.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55678 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Implement non-ASCII case conversion for ISO-8859-9, by Kazuki Iijima.
* enc/iso_8859_9.c: Exclude dotless i/I with dot from case-insensitive
matching because they are not a case pair.
* test/ruby/enc/test_iso_8859.rb: Make test coverage for ISO-8859-9
a bit more complete.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55666 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Implement non-ASCII case conversion for Windows-1252, by Serina Tai.
* test/ruby/enc/test_case_comprehensive.rb: Fix order of encodings.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55665 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Implement non-ASCII case conversion for ISO-8859-7, by Kosuke Kurihara.
* test/ruby/enc/test_case_comprehensive.rb: Fix order of encodings.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55664 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Implement non-ASCII case conversion for ISO-8859-5, by Masaru Onodera.
* test/ruby/enc/test_case_comprehensive.rb: Fix order of encodings.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55658 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
ISO-8859-9 to be able to implement different case conversions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55653 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
ISO-8859-7 to be able to implement different case conversions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55652 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Implement non-ASCII case conversion for ISO-8859-13, by Kanon Shindo.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55651 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
ISO-8859-13 to be able to implement different case conversions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55650 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Implement non-ASCII case conversion for ISO-8859-3, by Takuya Miyamoto.
* test/ruby/enc/test_case_comprehensive.rb: Extend special treatment
for Turkic.
* enc/iso_8859_3.c: Exclude dotless i/I with dot from case-insensitive
matching because they are not a case pair.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55648 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Implement non-ASCII case conversion for ISO-8859-3, by Takuya Miyamoto.
* test/ruby/enc/test_case_comprehensive.rb: Extend special treatment
for Turkic.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55642 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Implement non-ASCII case conversion for ISO-8859-10, by Toya Hosokawa.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55627 b2dd03c8-39d4-4d8f-98ff-823fe69b080e