Граф коммитов

50 Коммитов

Автор SHA1 Сообщение Дата
Martin Dürst 99cd0e1f79 Update lib/unicode_normalize/tables.rb to Unicode version 13.0.0 2021-07-08 14:45:03 +09:00
duerst 7fe64d17d3 update to Unicode Version 12.1.0 (beta)
Unicode Version 12.1.0 adds one single character, U+32FF SQUARE ERA NAME REIWA,
for the new Japanese era starting on May 1st. 12.1.0 will be finalized only on
May 7th, so we go with the beta version because further changes in the data we
need are highly unlikely, and we want to make sure Ruby is ready for the new era.

* common.mk: change UNICODE_VERSION to 12.1.0, UNICODE_BETA to YES

* enc/unicode/12.1.0, enc/unicode/12.1.0/casefold.h, enc/unicode/12.1.0/name2ctype.h:
  add directory and generated data files for new version

* lib/unicode_normalize/tables.rb: update for new character

* test/ruby/test_regexp.rb: add test for character property age=12.1

* test/test_unicode_normalize.rb: add test for NFKC decomposition of new character

This (mostly) completes issue #15195.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-05 00:58:51 +00:00
duerst c604219e8d change lib/unicode_normalize/tables.rb to single item per line to make diffs shorter
* template/unicode_norm_gen.tmpl: Change formatting of output to produce only a
  single item (or range) for each line to make future diffs shorter and easier
  to understand and check.

* lib/unicode_normalize/tables.rb: output of the above

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67439 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-04 23:40:48 +00:00
duerst cff7eefa07 update Unicode version (and Emoji version) to 12.0.0
- common.mk: set UNICODE_VERSION and UNICODE_EMOJI_VERSION to 12.0.0

- lib/unicode_normalize/tables.rb: update table data to Unicode version 12.0.0

- enc/unicode/12.0.0/casefold.h, enc/unicode/12.0.0/name2ctype.h: add generated
  files for Unicode version 12.0.0

This is the main commit for #15321.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67169 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-06 01:55:19 +00:00
duerst 66a6073859 update to Unicode 11.0.0 (main step, not complete yet)
- common.mk: Change Unicode version to 11.0.0, and Emoji version to 11.0
- test/ruby/enc/test_emoji_breaks.rb: update hard-coded Emoji version
- enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h:
  Add generated files. Files for Unicode 10.0.0 will be removed once we are
  sure 11.0.0 works.
- lib/unicode_normalize/tables.rb: Updated table.
- regparse.c: Almost completely reimplement grapheme cluster detection in
  function node_extended_grapheme_cluster().


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-05 08:10:24 +00:00
marcandre b9d42af0f2 lib/*: Prefer require_relative over require, remove explicit extension
[#15206] [Fix GH-1976]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65506 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-02 17:52:43 +00:00
duerst a5818630f8 revert r65091, r65090 because ci fails
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65093 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16 07:53:37 +00:00
duerst 33b5c610a6 update to Unicode 11.0.0 (basic step, not complete yet)
- common.mk: Change Unicode version to 11.0.0
- enc/unicode/case-folding.rb, enc/unicode.c: Initial changes to deal with
  Gregorian Mtavruli. This should bring us up to the same level as e.g.
  Python 3.7, by following the Unicode tables exactly. But it will
  produce undesirable (mixed-case) results for String#capitalize.
  This will be addressed in a later commit.
- enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h:
  Add generated files.
- lib/unicode_normalize/tables.rb: Updated table.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16 07:01:55 +00:00
duerst a7acec6750 fix range check for Hangul jamo trailers in Unicode normalization
* lib/unicode_normalize/normalize.rb: Fix the range check for trailing
  Hangul jamo characters in Unicode normalization. Different from
  leading or vowel jamos, where LBASE and VBASE are actual characters,
  a value equal to TBASE expresses the absence of a trailing jamo.
  This fix is technically correct, but there was no bug because
  the regular expressions in lib/unicode_normalize/tables.rb
  eliminate jamos equal to TBASE from normalization processing.

* test/test_unicode_normalize.rb: Add preventive test
  test_no_trailing_jamo based on
  d134809cd3
  just for the case we ever get a regression.

This closes issue #14934, thanks to MaLin (Lin Ma) for reporting.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-07-28 09:44:33 +00:00
duerst 04547c7dc0 update Ruby to Unicode 10.0.0
- In common.mk, set UNICODE_VERSION  to 10.0.0
- Generate and add enc/unicode/10.0.0/casefold.h and
  enc/unicode/10.0.0/name2ctype.h
- Update lib/unicode_normalize/tables.rb

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59759 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-06 07:56:41 +00:00
duerst 88892c8d65 add explanations about status of module UnicodeNormalize
In lib/unicode_normalize/normalize.rb, add explanations and clarifications
about the status of the files and the module. [ci skip]
This is in response to discussions at https://github.com/ruby/spec/pull/433
and https://bugs.ruby-lang.org/issues/5481#note-58.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58617 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-09 10:45:46 +00:00
duerst 140560e4ee move definition of String#unicode_normalized? to C to make sure it is documented
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalized?
  (including documentation). Leave a comment explaining that the file is now empty.
* string.c: Define String#unicode_normalized? in rb_str_unicode_normalized_p in C,
  (including documentation)
* lib/unicode_normalize/normalize.rb: Remove (re)definition of
  String#unicode_normalized? to avoid warnings (when $VERBOSE==true) and
  problems when String is frozen

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58555 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-04 02:00:19 +00:00
duerst 90ab1ee023 move definition of String#unicode_normalize! to C to make sure it is documented
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalize!
  (including documentation)
* string.c: Define String#unicode_normalize! in rb_str_unicode_normalize_bang in C,
  (including documentation)
* lib/unicode_normalize/normalize.rb: Remove (re)definition of
  String#unicode_normalize! to avoid warnings (when $VERBOSE==true) and
  problems when String is frozen

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58553 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-04 01:36:52 +00:00
duerst 5fee67c9ba move definition of String#unicode_normalize to C to make sure it is documented
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalize
  (including documentation)
* string.c: Define String#unicode_normalize in rb_str_unicode_normalize in C,
  (including documentation)
* lib/unicode_normalize/normalize.rb: Remove (re)definition of
  String#unicode_normalize to avoid warnings (when $VERBOSE==true) and
  problems when String is frozen

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58550 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-03 12:18:37 +00:00
duerst 8001dae820 rework definition of String#unicode_normalize! and #unicode_normalized?
simplify String#unicode_normalize! and #unicode_normalized?
in lib/unicode_normalize.rb by redefining them
in lib/unicode_normalize/normalize.rb

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58538 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-02 05:34:25 +00:00
duerst 42b8713703 rework definition of String#unicode_normalize
simplify String#unicode_normalize in lib/unicode_normalize.rb
by redefining it in lib/unicode_normalize/normalize.rb

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58537 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-02 05:15:04 +00:00
stomar 0c10564dd9 nodoc UnicodeNormalize module
* lib/unicode_normalize/normalize.rb: [DOC] nodoc the
  internal UnicodeNormalize module.
* lib/unicode_normalize/tables.rb: ditto.
* template/unicode_norm_gen.tmpl: ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58329 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-12 18:07:32 +00:00
duerst d25e478e91 * common.mk: Updated Unicode version to 9.0.0 [Feature #12513]
* unicode/9.0.0/casefold.h, name2ctype.h, unicode/data/9.0.0:
  new directories/files for Unicode version 9.0.0


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-09-07 08:13:08 +00:00
kazu f477d5e27a Fix commit miss
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55713 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-20 16:00:12 +00:00
kazu 14a145095f fix typos
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55711 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-20 15:58:59 +00:00
duerst 306b64bd1a * lib/unicode_normalize/tables.rb: Remove
UnicodeNormalize::UNICODE_VERSION (#12546).


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55706 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-07-19 09:21:18 +00:00
naruse 3e92b635fb Add frozen_string_literal: false for all files
When you change this to true, you may need to add more tests.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53141 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-16 05:07:31 +00:00
akr 68ebbbfebe * lib/open-uri.rb: Remove indicator for "frozen_string_literal: true".
* lib/pp.rb: Ditto.

* lib/prettyprint.rb: Ditto.

* lib/resolv.rb: Ditto.

* lib/securerandom.rb: Ditto.

* lib/tmpdir.rb: Ditto.

* lib/unicode_normalize/tables.rb: Ditto.

* test/net/ftp/test_buffered_socket.rb: Ditto.

* test/net/ftp/test_mlsx_entry.rb: Ditto.

* test/open-uri/test_open-uri.rb: Ditto.

* test/open-uri/test_ssl.rb: Ditto.

* test/pathname/test_pathname.rb: Ditto.

* test/test_pp.rb: Ditto.

* test/test_prettyprint.rb: Ditto.

* tool/transcode-tblgen.rb: Ditto.

* ext/pathname/lib/pathname.rb: Ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52526 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-11-10 11:48:14 +00:00
duerst ae8c13f517 common.mk, lib/unicode_normalize/tables.rb: Change Unicode
Version for character normalization data from 7.0.0 to 8.0.0.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52000 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-10-02 00:20:23 +00:00
nobu a136301dea unicode_norm_gen.tmpl: end marker
* template/unicode_norm_gen.tmpl: pragma needs the end marker too,
  not only the beginning marker.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51972 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-09-29 08:00:36 +00:00
duerst 1fb502caca tool/unicode_norm_gen.tmpl, lib/unicode_normalize/tables.rb:
get rid of many .freeze commands by using frozen_string_literal
pragma.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51971 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-09-29 07:54:05 +00:00
nobu 84b5bb9802 normalize.rb: remove redundant hash
* lib/unicode_normalize/normalize.rb (UnicodeNormalize): REGEXP_K
  matches only single chars which are keys of KOMPATIBLE_TABLE, so
  string in nfkd_one is always single char and one of the key of
  KOMPATIBLE_TABLE, that is that the default proc of NF_HASH_K only
  copies a pair in KOMPATIBLE_TABLE.  therefore NF_HASH_K is a
  part of KOMPATIBLE_TABLE always, and just redundant.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49929 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-03-11 03:56:44 +00:00
nobu b65b392e96 tables.rb: add
* lib/unicode_normalize/tables.rb: commit not to download and
  convert Unicode data files every time.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48386 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-11 17:41:53 +00:00
nobu 9b559f194c normalize.rb: fix syntax error
* lib/unicode_normalize/normalize.rb (normalized): fix syntax
  error, `when` clase allows `*` but not `**`.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48340 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-09 10:01:37 +00:00
duerst 62b511b6aa lib/unicode_normalize/normalize.rb: Replaced if-else by case in self.normalized? in parallel to r48309.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48338 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-09 09:33:36 +00:00
nobu d436c05163 unicode_norm_gen.tmpl: expand kompatible_table
* template/unicode_norm_gen.tmpl: expand kompatible_table so that
  recursive expansion is not needed at runtime.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48311 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-06 15:00:24 +00:00
nobu b8788417f0 normalize.rb: trivial optimizations
* lib/unicode_normalize/normalize.rb (nfc_one, normalize): trivial
  optimizations.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48309 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-06 15:00:17 +00:00
nobu 64034372b7 normalize.rb: explicit separator
* lib/unicode_normalize/normalize.rb (canonical_ordering_one):
  use explicit separator, not to depend on $,.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48308 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-06 15:00:14 +00:00
duerst 2b7f0289f8 lib/unicode_normalize/normalize.rb: Comment clarification. [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48290 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-05 23:49:55 +00:00
duerst 4fda619836 lib/unicode_normalize/normalize.rb: added US_ASCII
as trivially supported encoding (is always normalized,
and may appear mixed in with UTF-8 or other Unicode
encodings).

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48134 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-25 11:09:08 +00:00
nobu 696141dab4 lib/unicode_normalize/tables.rb: remove auto generated file.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48074 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-21 13:48:05 +00:00
duerst 5c27164d59 lib/unicode_normalize/tables.rb: Committing to make version
update easier and more predictable, and reducing compilation
time.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-21 08:12:20 +00:00
duerst 7415796ca3 lib/unicode_normalize/normalize.rb: Added comment to point to
relevant portion of Unicode standard for Hangul (de)composition
identifiers and algorithm.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48071 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-21 06:56:58 +00:00
nobu e64a3869bc unicode_normalize/normalize.rb: rename variable
* lib/unicode_normalize/normalize.rb (hangul_decomp_one): use more
  descriptive name.  leave [SLVT]BASE and [LVTNS]COUNT as they are
  vague names.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48055 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-20 11:50:00 +00:00
duerst acaafe2101 lib/unicode_normalize.rb: revert r48046. The s in sIndex
is not hungarian notation. The variable name sIndex is
directly taken from the relevant part of the Unicode
Standard, where it is written SIndex and stands for
'syllable index'. See pp. 144/145 of
http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48052 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-20 10:06:11 +00:00
nobu 6948188f38 unicode_normalize/normalize.rb: remove prefix
* lib/unicode_normalize/normalize.rb (hangul_decomp_one): remove
  system hungarian prefix, nonsense in ruby.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48046 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-20 05:01:02 +00:00
nobu 7f652dc6cf unicode_normalize/normalize.rb: simplify
* lib/unicode_normalize/normalize.rb (NF_HASH_{D,C,K}): remove
  first element by Hash#shift.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48045 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-20 05:00:58 +00:00
nobu 3a2f81cf9a unicode_normalize/normalize.rb: remove unnecessary module names
* lib/unicode_normalize/normalize.rb (UnicodeNormalize): use self
  instead of module name and remove module name if unnecessary.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-20 05:00:46 +00:00
nobu 51af3be356 lib/unicode_normalize.rb: remove BOMs
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48028 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-19 15:29:58 +00:00
duerst 8c722a9a1e lib/unicode_normalize/normalize.rb: Added a missing file extension in require statement.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48022 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-19 09:35:45 +00:00
duerst 982f0de141 tool/unicode_norm_gen.rb, lib/unicode_normalize.rb:
File name change from lib/unicode_normalize/normalize_tables.rb
to lib/unicode_normalize/tables.rb.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48015 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-19 02:09:13 +00:00
svn 4bf30d2944 * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48009 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-19 00:48:55 +00:00
duerst 4c769ce021 lib/unicode_normalize/normalize.rb: Changed module name, adjusted copyright.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48008 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-19 00:48:52 +00:00
svn d64dc54e0c * properties.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48007 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-19 00:38:53 +00:00
duerst 6017de0314 lib/unicode_normalize/normalize.rb: Importing from
https://github.com/duerst/eprun/blob/master/lib/normalize.rb.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48005 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-10-19 00:38:40 +00:00