Граф коммитов

570 Коммитов

Автор SHA1 Сообщение Дата
naruse 2873edeafb Merge Onigmo 6.0.0
* https://github.com/k-takata/Onigmo/blob/Onigmo-6.0.0/HISTORY
* fix for ruby 2.4: https://github.com/k-takata/Onigmo/pull/78
* suppress warning: https://github.com/k-takata/Onigmo/pull/79
* include/ruby/oniguruma.h: include onigmo.h.
* template/encdb.h.tmpl: ignore duplicated definition of EUC-CN in
  enc/euc_kr.c. It is defined in enc/gb2313.c with CRuby macro.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57045 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-10 17:47:04 +00:00
duerst 8baa73be48 remove special processing for U+03B9/U+03BC/U+A64B
* enc/unicode.c: Remove special processing for U+03B9/U+03BC/U+A64B
  (GREEK SMALL LETTERs IOTA/MU, CYRILLIC SMALL LETTER MONOGRAPH UK)
  from onigenc_unicode_case_map and simplify code.

* enc/unicode/case-folding.rb: Remove check for U+03B9/U+03BC/U+A64B.

This and the previous few related commits make sure that we won't hit
the equivalent of bug #12990 anymore for future updates of Unicode versions.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56976 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-04 01:58:54 +00:00
duerst 31fb4e3ec3 Reorder codepoints in some entries of CaseUnfold_11_Table
* enc/unicode/case-folding.rb: Reorder codepoints so that the upper-case
  mapping comes first.
* enc/unicode/9.0.0/casefold.h: Codepoints reordered, upper-case mapping
  flag added.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56975 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-04 01:17:34 +00:00
nobu 671c929f0a Use offsetof macro and shrink table size
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56952 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-01 00:34:42 +00:00
nobu 4f7c3d3583 constify CaseMappingSpecials
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56951 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-01 00:34:41 +00:00
naruse c11e648799 Regexp supports Unicoe 9.0.0's \X
* meta character \X matches Unicode 9.0.0 characters with some workarounds
  for UTR #51 Unicode Emoji, Version 4.0 emoji zwj sequences.
  [Feature #12831] [ruby-core:77586]

The term "character" can have many meanings bytes, codepoints, combined
characters, and so on. "grapheme cluster" is highest one of such words,
which means user-perceived characters.
Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION specifies how to
handle grapheme clusters (extended grapheme cluster).
But some specs aren't updated to current situation because Unicode Emoji
is rapidly extended without well definition.
It breaks the precondition of UTR#29 "Grapheme cluster boundaries can be
easily tested by looking at immediately adjacent characters". (the
sentence will be removed in the next version)
Though some of its detail are described in Unicode Technical Report #51
UNICODE EMOJI but it is not merged into UTR#29 yet.

http://unicode.org/reports/tr29/
http://unicode.org/reports/tr51/
http://unicode.org/Public/emoji/4.0/

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56949 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-30 17:29:19 +00:00
duerst 87b937bdfd fix uppercasing for U+A64B, CYRILLIC SMALL LETTER MONOGRAPH UK
* enc/unicode.c: Add U+A64B to the special cases 03B9 and 03BC
  at the end of onigenc_unicode_case_map (Bug #12990).

* enc/unicode/case-folding.rb: Add U+A64B to the special cases
  03B9 and 03BC. Add a comment pointing to enc/unicode.c.
  Change warnings to exceptions for unpredicted cases,
  because this would have been more easily noticed
  (the warning was not noticed when upgrading to Unicode 9.0.0).

* test/ruby/enc/test_case_comprehensive.rb: Remove temporary
  exclusion of U+A64B from testing.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56941 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-30 08:25:46 +00:00
duerst 2959b5aa16 * enc/windows_1254.c: Fix typo. Reported by k-takata at
https://github.com/k-takata/Onigmo/commit/ceb59cc. Thanks!


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56523 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-29 21:39:37 +00:00
nobu b24e093296 Update windows-1255 table
* enc/trans/windows-1255-tbl.rb: update mapping from 0xCA to
  U+05BA.  [Feature #12877]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56516 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-28 15:14:32 +00:00
nobu 4f7a051eee enc/depend: downcase
* enc/depend: downcase table file names.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56515 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-28 14:25:57 +00:00
nobu 8027dfafd7 enc/depend: extract transcode_tblgen
* enc/depend: extract transcode_tblgen method calls for libraries
  loaded by dynamically generated names, in single_byte.trans.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56514 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-28 14:22:34 +00:00
nobu 06711fd1e3 single_byte.trans: dead code
* enc/trans/single_byte.trans (transcode_tblgen_singlebyte):
  remove useless code.  returned value is not used.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56513 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-28 14:18:52 +00:00
duerst 64b62f40a5 * enc/windows_1254.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for Windows-1254.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56433 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-10-16 06:09:08 +00:00
duerst c0f48f2385 * unicode/8.0.0/casefold.h, name2ctype.h, unicode/data/8.0.0:
removing directories/files related to Unicode version 8.0.0


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56090 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-07 08:35:39 +00:00
duerst d25e478e91 * common.mk: Updated Unicode version to 9.0.0 [Feature #12513]
* unicode/9.0.0/casefold.h, name2ctype.h, unicode/data/9.0.0:
  new directories/files for Unicode version 9.0.0


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-07 08:13:08 +00:00
nobu 7b664abad1 common.mk: separate unicode headers
* common.mk (UNICODE_HDR_DIR): separate unicode header files from
  unicode data files.  [ruby-core:76879] [Bug #12677]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55942 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-08-16 12:04:34 +00:00
nobu a44caf8067 common.mk: UNICODE_HDR_DIR
* common.mk (UNICODE_HDR_DIR): directory for unicode headers.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55933 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-08-16 08:53:49 +00:00
nobu bfb5b0f84b iso_8859_2.c: dedent [ci skip]
* enc/iso_8859_2.c: remove unnecessary indent.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55780 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-30 10:32:06 +00:00
duerst 4abdd6c5aa * enc/iso_8859_2.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for ISO-8859-2, by Yushiro Ishii.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55775 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-30 03:00:09 +00:00
duerst 55378a9eb6 * enc/windows_1253.c: Remove dead code found by Coverity Scan.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-27 01:33:01 +00:00
duerst 7b2b2869c9 * enc/windows_1257.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for Windows-1257, by Sho Koike.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55752 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-26 07:33:18 +00:00
duerst 14dd8a17e8 * enc/windows_1250.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for Windows-1250, by Sho Koike.
* ChangeLog: Fixed order of previous two entries.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55751 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-26 07:19:43 +00:00
duerst aec1ac6e51 * enc/windows_1251.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for Windows-1251, by Shunsuke Sato.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55750 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-26 06:54:18 +00:00
duerst c8a1d8b33b * enc/windows_1251.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for Windows-1251, by Shunsuke Sato.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55749 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-26 06:30:39 +00:00
duerst 6ed393ad89 * regenc.h/c, include/ruby/oniguruma.h, enc/ascii.c, big5.c, cp949.c,
emacs_mule.c, euc_jp.c, euc_kr.c, euc_tw.c, gb18030.c, gbk.c,
  iso_8859_1|2|3|4|5|6|7|8|9|10|11|13|14|15|16.c, koi8_r.c, koi8_u.c,
  shift_jis.c, unicode.c, us_ascii.c, utf_16|32be|le.c, utf_8.c,
  windows_1250|51|52|53|54|57.c, windows_31j.c, unicode.c:
  Remove conditional compilation macro ONIG_CASE_MAPPING. [Feature #12386].


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55740 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-24 07:33:15 +00:00
nobu af2d3c9866 Move generated headers to unicode data directory
* common.mk, enc/depend (casefold.h, name2ctype.h): move to
  unicode data directory per version.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55701 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-17 11:59:26 +00:00
nobu d54856c1d4 common.mk: directory timestamps
* common.mk, enc/Makefile.in: moved timestamp files for
  directories under the specific directory, to get rid of match
  with files under the source directory.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55696 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-15 21:26:02 +00:00
usa 251d87583b Revert r55693 because it broke building on all platforms (and had no ChangeLog).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55694 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-15 20:24:01 +00:00
nobu 2ace43ba57 common.mk: directory timestamps
* common.mk, enc/Makefile.in: moved timestamp files for
  directories under the specific directory, to get rid of match
  with files under the source directory.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55693 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-15 14:08:20 +00:00
nobu e827c334c3 enc/unicode: check Unicode versions
* enc/unicode/case-folding.rb, tool/enc-unicode.rb: check if
  Unicode versions are consistent with each other.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55687 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-15 00:53:50 +00:00
nobu 2f87f9e63b common.mk: update enc/unicode/name2ctype.h
* Makefile.in (enc/unicode/name2ctype.h): remove stale recipe,
  which did not support Unicode age properties.
* common.mk (enc/unicode/name2ctype.h): update by --header option
  of tool/enc-unicode.rb.  enc/unicode/name2ctype.kwd file has not
  been used.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55678 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-14 08:26:04 +00:00
kazu 9632e413a7 Fix file name in comment again
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55670 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-13 14:22:22 +00:00
duerst 2ac58e6891 * enc/iso_8859_9.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for ISO-8859-9, by Kazuki Iijima.
* enc/iso_8859_9.c: Exclude dotless i/I with dot from case-insensitive
  matching because they are not a case pair.
* test/ruby/enc/test_iso_8859.rb: Make test coverage for ISO-8859-9
  a bit more complete.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55666 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-13 09:09:47 +00:00
duerst 9f74ae4cf5 * enc/windows_1252.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for Windows-1252, by Serina Tai.
* test/ruby/enc/test_case_comprehensive.rb: Fix order of encodings.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55665 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-13 08:21:29 +00:00
duerst 6a52a5488a * enc/iso_8859_7.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for ISO-8859-7, by Kosuke Kurihara.
* test/ruby/enc/test_case_comprehensive.rb: Fix order of encodings.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55664 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-13 07:19:25 +00:00
nobu fc4b2d9228 Fix file names in comments
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55661 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-13 06:32:37 +00:00
duerst b9cd6920d2 * enc/iso_8859_1.c, enc/iso_8859_4.c: Avoid setting modification flag if
there is no modification.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55660 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-13 06:19:07 +00:00
duerst e3600eaca1 * enc/iso_8859_5.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for ISO-8859-5, by Masaru Onodera.
* test/ruby/enc/test_case_comprehensive.rb: Fix order of encodings.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55658 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-13 05:40:12 +00:00
duerst c5682ac490 * enc/windows_1254.c: Adjust variable/macro names.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55654 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-13 05:15:28 +00:00
duerst 93c1109c19 * enc/iso_8859_9.c, enc/windows_1254.c: Split Windows-1254 from
ISO-8859-9 to be able to implement different case conversions.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55653 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-13 04:19:17 +00:00
duerst cbc947885a * enc/iso_8859_7.c, enc/windows_1253.c: Split Windows-1253 from
ISO-8859-7 to be able to implement different case conversions.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55652 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-13 04:08:36 +00:00
duerst 336b6b1980 * enc/iso_8859_13.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for ISO-8859-13, by Kanon Shindo.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55651 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-13 01:50:17 +00:00
duerst 0f3d197da1 * enc/iso_8859_13.c, enc/windows_1257.c: Split Windows-1257 from
ISO-8859-13 to be able to implement different case conversions.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55650 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-13 01:31:44 +00:00
duerst 19b5e818dd * enc/iso_8859_3.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for ISO-8859-3, by Takuya Miyamoto.
* test/ruby/enc/test_case_comprehensive.rb: Extend special treatment
  for Turkic.
* enc/iso_8859_3.c: Exclude dotless i/I with dot from case-insensitive
  matching because they are not a case pair.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55648 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-13 00:02:34 +00:00
duerst 7c0cb4351a * revert r55642 (previous commit) because of test failure at
https://travis-ci.org/ruby/ruby/builds/144148780


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55643 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-12 12:59:46 +00:00
duerst 7b66f0bae9 * enc/iso_8859_3.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for ISO-8859-3, by Takuya Miyamoto.
* test/ruby/enc/test_case_comprehensive.rb: Extend special treatment
  for Turkic.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55642 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-12 12:33:17 +00:00
duerst 7253570a83 * enc/iso_8859_1.c: Moved test for lowercase characters without
uppercase equivalent.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55632 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-11 09:05:53 +00:00
duerst b5d869a89d * enc/iso_8859_4.c, enc/iso_8859_10.c, enc/iso_8859_14.c,
enc/iso_8859_15.c, enc/iso_8859_16.c: Replace case-by-case code with
  lookup in ENC_ISO_8859_xx_TO_LOWER_CASE table.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55631 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-11 08:49:38 +00:00
nobu a00ec4cf3a enc/iso_8859_4.c: adjust indent [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55628 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-10 14:28:25 +00:00
duerst 07ac66ccec * enc/iso_8859_10.c, test/ruby/enc/test_case_comprehensive.rb:
Implement non-ASCII case conversion for ISO-8859-10, by Toya Hosokawa.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55627 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-10 10:53:45 +00:00