Граф коммитов

97 Коммитов

Автор SHA1 Сообщение Дата
duerst 50eea44a27 remove Unicode 12.0.0 related directory and generated files
This completes issue #15195.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67453 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-05 23:52:15 +00:00
duerst 7fe64d17d3 update to Unicode Version 12.1.0 (beta)
Unicode Version 12.1.0 adds one single character, U+32FF SQUARE ERA NAME REIWA,
for the new Japanese era starting on May 1st. 12.1.0 will be finalized only on
May 7th, so we go with the beta version because further changes in the data we
need are highly unlikely, and we want to make sure Ruby is ready for the new era.

* common.mk: change UNICODE_VERSION to 12.1.0, UNICODE_BETA to YES

* enc/unicode/12.1.0, enc/unicode/12.1.0/casefold.h, enc/unicode/12.1.0/name2ctype.h:
  add directory and generated data files for new version

* lib/unicode_normalize/tables.rb: update for new character

* test/ruby/test_regexp.rb: add test for character property age=12.1

* test/test_unicode_normalize.rb: add test for NFKC decomposition of new character

This (mostly) completes issue #15195.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-05 00:58:51 +00:00
duerst f831ca6764 delete directory and files related to Unicode version 11.0.0
this completes and closes feature #15321

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67174 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-06 03:19:10 +00:00
duerst cff7eefa07 update Unicode version (and Emoji version) to 12.0.0
- common.mk: set UNICODE_VERSION and UNICODE_EMOJI_VERSION to 12.0.0

- lib/unicode_normalize/tables.rb: update table data to Unicode version 12.0.0

- enc/unicode/12.0.0/casefold.h, enc/unicode/12.0.0/name2ctype.h: add generated
  files for Unicode version 12.0.0

This is the main commit for #15321.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67169 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-06 01:55:19 +00:00
duerst c2d8078e3d delete Unicode 10.0.0 related files, no longer needed [#14802]
This line, and those below, will be ignored--

D    enc/unicode/10.0.0


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66295 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-09 02:02:45 +00:00
duerst 66a6073859 update to Unicode 11.0.0 (main step, not complete yet)
- common.mk: Change Unicode version to 11.0.0, and Emoji version to 11.0
- test/ruby/enc/test_emoji_breaks.rb: update hard-coded Emoji version
- enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h:
  Add generated files. Files for Unicode 10.0.0 will be removed once we are
  sure 11.0.0 works.
- lib/unicode_normalize/tables.rb: Updated table.
- regparse.c: Almost completely reimplement grapheme cluster detection in
  function node_extended_grapheme_cluster().


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-05 08:10:24 +00:00
nobu 7aaf5b2878 Embed the Emoji version
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66023 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 06:44:02 +00:00
duerst 2d5b57d63c prepare for Unicode 11.0.0 update
- enc/unicode/case-folding.rb:
  - Convert unpredicted case to actual flag setting
  - Eliminate an unused variable
  - Change a variable name to avoid a warning

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65933 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-23 06:45:26 +00:00
nobu 34cc6fef83 Make some internal functions static
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65764 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-16 06:52:00 +00:00
duerst a5818630f8 revert r65091, r65090 because ci fails
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65093 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16 07:53:37 +00:00
duerst 33b5c610a6 update to Unicode 11.0.0 (basic step, not complete yet)
- common.mk: Change Unicode version to 11.0.0
- enc/unicode/case-folding.rb, enc/unicode.c: Initial changes to deal with
  Gregorian Mtavruli. This should bring us up to the same level as e.g.
  Python 3.7, by following the Unicode tables exactly. But it will
  produce undesirable (mixed-case) results for String#capitalize.
  This will be addressed in a later commit.
- enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h:
  Add generated files.
- lib/unicode_normalize/tables.rb: Updated table.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16 07:01:55 +00:00
duerst 7223582866 add some comments to enc/unicode/case-folding.rb [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65090 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16 06:41:47 +00:00
nobu a4804fbdf5 support gperf 3.1
* tool/gperf.sed: extracted sed commands to a script.  ANSI-C code
  produced by gperf 3.1 declares length arguments as `size_t`.  it
  causes conflict with existing declarations, and needs casts for
  a local variable and return statements.
  [Feature #13883]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61076 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-08 05:51:19 +00:00
nobu 01830719f6 fix for emoji-data.txt
* common.mk: download emoji-data.txt.  As emoji data files are
  located in a separate directory in Unicode.org site, reearranged
  Unicode data files directories same as the site.

* tool/enc-unicode.rb (get_file): search emoji data files in the
  second argument path.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60977 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-02 03:12:51 +00:00
duerst df155f092c remove Unicode 9.0.0-related files
We don't need these files anymore because we upgraded to Unicode 10.0.0.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59760 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-06 08:12:02 +00:00
duerst 04547c7dc0 update Ruby to Unicode 10.0.0
- In common.mk, set UNICODE_VERSION  to 10.0.0
- Generate and add enc/unicode/10.0.0/casefold.h and
  enc/unicode/10.0.0/name2ctype.h
- Update lib/unicode_normalize/tables.rb

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59759 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-06 07:56:41 +00:00
nobu 8083a359d0 enc-unicode.rb: uniname2ctype_offset
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58065 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-23 07:59:56 +00:00
nobu 12b8058661 update name2ctype.h
* enc/unicode/9.0.0/name2ctype.h: update due to merger of Onigmo
  6.0.0.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58064 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-23 07:53:35 +00:00
duerst 8baa73be48 remove special processing for U+03B9/U+03BC/U+A64B
* enc/unicode.c: Remove special processing for U+03B9/U+03BC/U+A64B
  (GREEK SMALL LETTERs IOTA/MU, CYRILLIC SMALL LETTER MONOGRAPH UK)
  from onigenc_unicode_case_map and simplify code.

* enc/unicode/case-folding.rb: Remove check for U+03B9/U+03BC/U+A64B.

This and the previous few related commits make sure that we won't hit
the equivalent of bug #12990 anymore for future updates of Unicode versions.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56976 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-04 01:58:54 +00:00
duerst 31fb4e3ec3 Reorder codepoints in some entries of CaseUnfold_11_Table
* enc/unicode/case-folding.rb: Reorder codepoints so that the upper-case
  mapping comes first.
* enc/unicode/9.0.0/casefold.h: Codepoints reordered, upper-case mapping
  flag added.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56975 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-04 01:17:34 +00:00
nobu 671c929f0a Use offsetof macro and shrink table size
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56952 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-01 00:34:42 +00:00
nobu 4f7c3d3583 constify CaseMappingSpecials
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56951 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-01 00:34:41 +00:00
naruse c11e648799 Regexp supports Unicoe 9.0.0's \X
* meta character \X matches Unicode 9.0.0 characters with some workarounds
  for UTR #51 Unicode Emoji, Version 4.0 emoji zwj sequences.
  [Feature #12831] [ruby-core:77586]

The term "character" can have many meanings bytes, codepoints, combined
characters, and so on. "grapheme cluster" is highest one of such words,
which means user-perceived characters.
Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION specifies how to
handle grapheme clusters (extended grapheme cluster).
But some specs aren't updated to current situation because Unicode Emoji
is rapidly extended without well definition.
It breaks the precondition of UTR#29 "Grapheme cluster boundaries can be
easily tested by looking at immediately adjacent characters". (the
sentence will be removed in the next version)
Though some of its detail are described in Unicode Technical Report #51
UNICODE EMOJI but it is not merged into UTR#29 yet.

http://unicode.org/reports/tr29/
http://unicode.org/reports/tr51/
http://unicode.org/Public/emoji/4.0/

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56949 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-30 17:29:19 +00:00
duerst 87b937bdfd fix uppercasing for U+A64B, CYRILLIC SMALL LETTER MONOGRAPH UK
* enc/unicode.c: Add U+A64B to the special cases 03B9 and 03BC
  at the end of onigenc_unicode_case_map (Bug #12990).

* enc/unicode/case-folding.rb: Add U+A64B to the special cases
  03B9 and 03BC. Add a comment pointing to enc/unicode.c.
  Change warnings to exceptions for unpredicted cases,
  because this would have been more easily noticed
  (the warning was not noticed when upgrading to Unicode 9.0.0).

* test/ruby/enc/test_case_comprehensive.rb: Remove temporary
  exclusion of U+A64B from testing.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56941 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-30 08:25:46 +00:00
duerst c0f48f2385 * unicode/8.0.0/casefold.h, name2ctype.h, unicode/data/8.0.0:
removing directories/files related to Unicode version 8.0.0


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56090 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-07 08:35:39 +00:00
duerst d25e478e91 * common.mk: Updated Unicode version to 9.0.0 [Feature #12513]
* unicode/9.0.0/casefold.h, name2ctype.h, unicode/data/9.0.0:
  new directories/files for Unicode version 9.0.0


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-07 08:13:08 +00:00
nobu 7b664abad1 common.mk: separate unicode headers
* common.mk (UNICODE_HDR_DIR): separate unicode header files from
  unicode data files.  [ruby-core:76879] [Bug #12677]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55942 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-08-16 12:04:34 +00:00
nobu af2d3c9866 Move generated headers to unicode data directory
* common.mk, enc/depend (casefold.h, name2ctype.h): move to
  unicode data directory per version.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55701 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-17 11:59:26 +00:00
nobu e827c334c3 enc/unicode: check Unicode versions
* enc/unicode/case-folding.rb, tool/enc-unicode.rb: check if
  Unicode versions are consistent with each other.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55687 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-15 00:53:50 +00:00
nobu 2f87f9e63b common.mk: update enc/unicode/name2ctype.h
* Makefile.in (enc/unicode/name2ctype.h): remove stale recipe,
  which did not support Unicode age properties.
* common.mk (enc/unicode/name2ctype.h): update by --header option
  of tool/enc-unicode.rb.  enc/unicode/name2ctype.kwd file has not
  been used.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55678 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-14 08:26:04 +00:00
nobu 893bb61bcb case-folding.rb: define version numbers
* enc/unicode/case-folding.rb: define Unicode version numbers.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55546 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-30 08:24:11 +00:00
nobu 753ce99eac case-folding.rb: check version numbers
* enc/unicode/case-folding.rb: check if version numbers in each
  data files match.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55545 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-30 08:13:28 +00:00
naruse 0e585b37ec Revert "Use gperf 3.0.4"
It is wrong commit.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55518 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-28 04:38:32 +00:00
naruse 4b31485ad8 Use gperf 3.0.4
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55514 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-27 18:30:12 +00:00
nobu 656c458665 Read CaseFolding.txt in binary mode
* enc/unicode/case-folding.rb (CaseFolding#load): read in binary
  mode to deal with non-ASCII charater in CaseFolding.txt.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55496 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-24 05:29:28 +00:00
nobu eff6873363 touch
* enc/unicode/case-folding.rb: touch the destination file.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55494 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-24 00:23:46 +00:00
nobu d1e2c50a0c Updating casefold.h
* common.mk (lib/unicode_normalize/tables.rb): should not depend
  on Unicode data files unless ALWAYS_UPDATE_UNICODE=yes, to get
  rid of downloading Unicode data unnecessary.  [ruby-dev:49681]
* common.mk (enc/unicode/casefold.h): update Unicode files in a
  sub-make, not to let the header depend on the files always.
* enc/unicode/case-folding.rb: if gperf is not usable, assume the
  existing file is OK.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55492 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-06-24 00:17:17 +00:00
duerst 5e9d33ad49 * enc/unicode/case-folding.rb, casefold.h: Data generation to implement
swapcase functionality for titlecase characters. Swapcase isn't defined
  by Unicode, because the purpose/usage of swapcase is unclear anyway.
  The implementation follows a proposal from Nobu, swaping the case of
  each component of a titlecase character individually.
  This means that the titlecase characters have to be decomposed.
* enc/unicode.c: Code using the above data.
* test/ruby/enc/test_case_mapping.rb: Tests for the above.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54469 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-04-01 11:58:47 +00:00
duerst 78f540019a * enc/unicode/case-folding.rb, casefold.h: Tweaked handling of 6
special cases in CaseUnfold_11_Table.
* enc/unicode.c: Adjustments for above.
* test/ruby/enc/test_case_mapping.rb: Tests for the above: Some tests in
  test_titlecase activated; test_greek added. A test in test_cherokee fixed.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54383 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-03-29 07:53:43 +00:00
duerst 0e6f8b166d * enc/unicode/case-folding.rb, casefold.h: Removing data for idempotent
titlecasing.
* enc/unicode.c: Adjust code to data removal.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54347 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-03-29 04:24:55 +00:00
svn d864828fb4 * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54230 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-03-22 12:08:31 +00:00
duerst 2f455ceca4 * include/ruby/oniguruma.h: Additional flag for characters that are titlecase.
* enc/unicode/case-folding.rb, casefold.h: Using above flag in data.
* enc/unicode.c: Marking capitalized character as unmodified if it is
  already titlecase.
* test/ruby/enc/test_case_mapping.rb: Tests for above functionality.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54229 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-03-22 12:08:30 +00:00
duerst 59766643db * enc/unicode/case-folding.rb, casefold.h: Streamlining approach to
case mapping data not available from case folding by unifying all
  three cases (special title, special upper, special lower).
* enc/unicode.c: Adjust macro names for above (macros are currently inactive).
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54085 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-03-11 07:11:27 +00:00
duerst c4e6964141 * enc/unicode/case-folding.rb, casefold.h: Reducing size of TitleCase
table by eliminating duplicates.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53957 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-27 08:06:17 +00:00
duerst 7feb182a08 * enc/unicode/case-folding.rb: Adding possibility for debugging output
for TitleCase table in casefold.h.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53930 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-25 10:04:59 +00:00
duerst 1cc579cb00 * enc/unicode/case-folding.rb, casefold.h: Outputting actual titlecase
data (new table, with indices from other tables).
* enc/unicode.c: Ignoring titlecase data indices for the moment.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53906 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-23 12:53:10 +00:00
duerst 8aa8847b7c * enc/unicode/case-folding.rb, casefold.h: Reading casing data from
SpecialCasing.txt.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53904 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-23 06:21:55 +00:00
duerst 4ca9138bac * enc/unicode/case-folding.rb, casefold.h: Adding flag for title-case,
not yet operational.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53891 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-22 09:34:34 +00:00
duerst 5470ce8206 * enc/unicode/case-folding.rb, casefold.h: Fixed bug that avoided inclusion
of compatibility characters in uppper-/lower-case mappings.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53890 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-22 09:17:43 +00:00
duerst 6a808bda64 * enc/unicode/case-folding.rb, casefold.h: Used only first element
(rather than all) of target in CaseUnfold_11 array.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53843 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-16 10:10:37 +00:00