Граф коммитов

619 Коммитов

Автор SHA1 Сообщение Дата
duerst 50eea44a27 remove Unicode 12.0.0 related directory and generated files
This completes issue #15195.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67453 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-05 23:52:15 +00:00
duerst 7fe64d17d3 update to Unicode Version 12.1.0 (beta)
Unicode Version 12.1.0 adds one single character, U+32FF SQUARE ERA NAME REIWA,
for the new Japanese era starting on May 1st. 12.1.0 will be finalized only on
May 7th, so we go with the beta version because further changes in the data we
need are highly unlikely, and we want to make sure Ruby is ready for the new era.

* common.mk: change UNICODE_VERSION to 12.1.0, UNICODE_BETA to YES

* enc/unicode/12.1.0, enc/unicode/12.1.0/casefold.h, enc/unicode/12.1.0/name2ctype.h:
  add directory and generated data files for new version

* lib/unicode_normalize/tables.rb: update for new character

* test/ruby/test_regexp.rb: add test for character property age=12.1

* test/test_unicode_normalize.rb: add test for NFKC decomposition of new character

This (mostly) completes issue #15195.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67441 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-05 00:58:51 +00:00
duerst f831ca6764 delete directory and files related to Unicode version 11.0.0
this completes and closes feature #15321

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67174 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-06 03:19:10 +00:00
duerst cff7eefa07 update Unicode version (and Emoji version) to 12.0.0
- common.mk: set UNICODE_VERSION and UNICODE_EMOJI_VERSION to 12.0.0

- lib/unicode_normalize/tables.rb: update table data to Unicode version 12.0.0

- enc/unicode/12.0.0/casefold.h, enc/unicode/12.0.0/name2ctype.h: add generated
  files for Unicode version 12.0.0

This is the main commit for #15321.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67169 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-03-06 01:55:19 +00:00
nobu 4fc656a2f3 Removed duplicate dependents
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-02-14 05:40:08 +00:00
nobu b2f4241509 Update dependencies, internal.h includes ruby.h
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67057 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-02-12 12:31:55 +00:00
duerst 3628eae2e7 implement special behavior for Georgian for String#capitalize
The modern Georgian script is special in that it has an 'uppercase'
variant called MTAVRULI which can be used for emphasis of whole words,
for screamy headlines, and so on. However, in contrast to all other
bicameral scripts, there is no usage of capitalizing the first letter
in a word or a sentence. Words with mixed capitalization are not used
at all.

We therefore implement special behavior for String#capitalize. Formally,
we define String#capitalize as first applying String#downcase for the
whole string, then using titlecase on the first letter. Because Georgian
defines titlecase as the identity function both for MTAVRULI ('uppercase')
and Mkhedruli (lowercase), this results in String#capitalize being
equivalent to String#downcase for Georgian. This avoids undesirable
mixed case.

* enc/unicode.c: Actual implementation

* string.c: Add mention of this special case for documentation

* test/ruby/enc/test_case_mapping.rb: Add two tests, a general one
  that uses String#capitalize on some (including nonsensical)
  combinations of MTAVRULI and Mkhedruli, and a canary test to
  detect the potential assignment of characters to the currently
  open slots (holes) at U+1CBB and U+1CBC.

* test/ruby/enc/test_case_comprehensive.rb: Tweak generation of
  expectation data.

Together with r65933, this closes issue #14839.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66300 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-09 23:14:29 +00:00
duerst c2d8078e3d delete Unicode 10.0.0 related files, no longer needed [#14802]
This line, and those below, will be ignored--

D    enc/unicode/10.0.0


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66295 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-09 02:02:45 +00:00
duerst e824e21beb remove obsolete data from unicode.c
* unicode.c: Remove the arrays onigenc_unicode_GCB_ranges_GAZ,
  onigenc_unicode_GCB_ranges_E_Base, and onigenc_unicode_GCB_ranges_Emoji,
  because they are not needed anymore for Unicode 11.0.0.

* regparse.c: Remove external declarations for above arrays.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66232 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06 00:05:08 +00:00
duerst 66a6073859 update to Unicode 11.0.0 (main step, not complete yet)
- common.mk: Change Unicode version to 11.0.0, and Emoji version to 11.0
- test/ruby/enc/test_emoji_breaks.rb: update hard-coded Emoji version
- enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h:
  Add generated files. Files for Unicode 10.0.0 will be removed once we are
  sure 11.0.0 works.
- lib/unicode_normalize/tables.rb: Updated table.
- regparse.c: Almost completely reimplement grapheme cluster detection in
  function node_extended_grapheme_cluster().


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-05 08:10:24 +00:00
duerst a96a594f99 solve the genie/zombie/wrestlers bug
enc/unicode.c: - Add U+1F93C (WRESTLERS), U+1F9DE (GENIE), and U+1F9DF
                 to onigenc_unicode_GCB_ranges_E_Base.
               - Add comments with character names.
test/ruby/enc/test_emoji_breaks.rb: Activate tests for genie/zombie/wrestlers.
This closes issue #15343.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66133 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-02 10:07:42 +00:00
nobu 26771cadc0 Added words in the comment at r65088 [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66103 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-30 07:19:49 +00:00
nobu 7aaf5b2878 Embed the Emoji version
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66023 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 06:44:02 +00:00
duerst fc6243a6a6 deal with ONIGENC_CASE_IS_TITLECASE flag on lowercase characters
In the function onigenc_unicode_case_map() in enc/unicode.c, deal
with the case that the ONIGENC_CASE_IS_TITLECASE flag is set on
lowercase characters. This is in preparation for Georgian Mtavruli,
which are uppercase but not titlecase, in Unicode 11.0.0.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65971 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-25 10:12:45 +00:00
duerst 2d5b57d63c prepare for Unicode 11.0.0 update
- enc/unicode/case-folding.rb:
  - Convert unpredicted case to actual flag setting
  - Eliminate an unused variable
  - Change a variable name to avoid a warning

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65933 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-23 06:45:26 +00:00
nobu 34cc6fef83 Make some internal functions static
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65764 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-16 06:52:00 +00:00
shyouhei 6732423b5e enc/unicode.c: 'a' is bigger than 'A'
In ASCII, 'a' is bigger than 'A'. Which means 'A' - 'a' is a negative
number (-32, to be precise). In C, the type of 'a' and 'A' are signed
int (cf: ISO/IEC 9899:1990 section 6.1.3.4). So 'A' - 'a' is also a
signed int. It is `(signed int)-32`.

The problem is, OnigCodePoint is unsigned int. Adding a negative
number to a variable of OnigCodepoint (`code` here) introduces an
unintentional cast of `(unsigned)(signed)-32`, which is
4,294,967,264. Adding this value to code then overflows, and the
result eventually becomes normal codepoint.

The series of operations are not a serious problem but because
`code >= 'a'` holds, we can `(code - 'a') + 'A'` to reroute this.

See also: https://github.com/k-takata/Onigmo/pull/107


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65752 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-16 02:34:00 +00:00
duerst a5818630f8 revert r65091, r65090 because ci fails
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65093 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16 07:53:37 +00:00
duerst 33b5c610a6 update to Unicode 11.0.0 (basic step, not complete yet)
- common.mk: Change Unicode version to 11.0.0
- enc/unicode/case-folding.rb, enc/unicode.c: Initial changes to deal with
  Gregorian Mtavruli. This should bring us up to the same level as e.g.
  Python 3.7, by following the Unicode tables exactly. But it will
  produce undesirable (mixed-case) results for String#capitalize.
  This will be addressed in a later commit.
- enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h:
  Add generated files.
- lib/unicode_normalize/tables.rb: Updated table.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16 07:01:55 +00:00
duerst 7223582866 add some comments to enc/unicode/case-folding.rb [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65090 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16 06:41:47 +00:00
nobu 0814870fed Removed data for old Unicode [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16 05:14:59 +00:00
nobu e07c5baf66 unicode.c: moved addtional GCB ranges
* enc/unicode.c: moved additional Grapheme Cluster Break ranges
  which depend on the Unicode version.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-15 13:48:20 +00:00
nobu 179045acaf regparse.c: Suppress duplicated range warning by mere \X
* regparse.c (node_extended_grapheme_cluster): as Unicode 10 has
  added Grapheme_Cluster_Break properties to some characters,
  remove duplicated ranges for Unicode 9.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65086 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-15 12:31:25 +00:00
naruse 1d74de37e3 ruby tool/update-deps --fix
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63860 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-07-05 12:48:45 +00:00
k0kubun 3406c5d613 Refactor ERB version checking for keyword arguments
Improving code like r62590. See r62529 for details.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62594 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-27 11:12:23 +00:00
naruse b81538d259 Judge ERB version not RUBY_VERSION but ERB.version
On cross compilation, ruby command uses fake RUBY_VERSION.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62560 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-24 02:58:41 +00:00
k0kubun cc777d09f4 erb.rb: deprecate safe_level of ERB.new
Also, as it's in the middle of the list of 4 arguments, 3rd and 4th arguments
(trim_mode, eoutvar) are changed to keyword arguments.
Old ways to specify arguments are deprecated and warned now.

bin/erb: deprecate -S option.

We'll remove all of deprecated ones at Ruby 2.7+.

enc/make_encmake.rb: stopped using deprecated interface
ext/etc/mkconstants.rb: ditto
ext/socket/mkconstants.rb: ditto
sample/ripper/ruby2html.rb: ditto
spec/ruby/library/erb/defmethod/def_erb_method_spec.rb: ditto
spec/ruby/library/erb/new_spec.rb: ditto
test/erb/test_erb.rb: ditto
test/erb/test_erb_command.rb: ditto
tool/generic_erb.rb: ditto
tool/ruby_vm/helpers/dumper.rb: ditto
tool/transcode-tblgen.rb: ditto
lib/rdoc/erbio.rb: ditto
lib/rdoc/generator/darkfish.rb: ditto

[Feature #14256]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62529 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-22 13:28:25 +00:00
nobu 982e27e95d Update dependencies
* common.mk: enc/unicode.$(OBJEXT) depends on onigmo.h via
  oniguruma.h.

* common.mk: dependencies of *prelude.$(OBJEXT) are defined for
  each generated C sources.

* enc/depend: casefold.h and name2ctype.h are located under
  $(UNICODE_HDR_DIR).

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61800 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-13 09:23:40 +00:00
kazu e02e13d723 Update dependencies using `tool/update-deps`
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61799 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-13 00:30:01 +00:00
nobu aaa18bae72 update dependencies
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61715 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-09 06:55:55 +00:00
nobu 7c4306e6e9 gperf.sed: static declarations
* tool/gperf.sed: comment out arguments part only, to keep the
  following declarations static.  [Feature #13883]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61282 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-15 14:42:43 +00:00
nobu a4804fbdf5 support gperf 3.1
* tool/gperf.sed: extracted sed commands to a script.  ANSI-C code
  produced by gperf 3.1 declares length arguments as `size_t`.  it
  causes conflict with existing declarations, and needs casts for
  a local variable and return statements.
  [Feature #13883]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61076 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-08 05:51:19 +00:00
nobu 01830719f6 fix for emoji-data.txt
* common.mk: download emoji-data.txt.  As emoji data files are
  located in a separate directory in Unicode.org site, reearranged
  Unicode data files directories same as the site.

* tool/enc-unicode.rb (get_file): search emoji data files in the
  second argument path.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60977 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-02 03:12:51 +00:00
naruse 9ba2aa7644 Add missing file
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60967 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-01 14:08:13 +00:00
naruse 31796f17d3 Update to Onigmo 6.1.3-669ac9997619954c298da971fcfacccf36909d05.
[Bug #13892]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60966 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-01 13:50:13 +00:00
duerst df155f092c remove Unicode 9.0.0-related files
We don't need these files anymore because we upgraded to Unicode 10.0.0.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59760 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-06 08:12:02 +00:00
duerst 04547c7dc0 update Ruby to Unicode 10.0.0
- In common.mk, set UNICODE_VERSION  to 10.0.0
- Generate and add enc/unicode/10.0.0/casefold.h and
  enc/unicode/10.0.0/name2ctype.h
- Update lib/unicode_normalize/tables.rb

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59759 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-06 07:56:41 +00:00
duerst 3d46d51c45 replace copyrights by explanatory text in data files for GB2312/GB12345 mappings
Replace the copyrights and explanatory texts in the data files used for mapping
GB2312/GB12345 to/from Unicode with short explanatory texts. [Bug #12598] [ci skip]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59715 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-09-01 10:22:09 +00:00
duerst 11954049fa Change max byte length of UTF-8 to 4 bytes
In enc/utf_8.c, change maximum byte length of UTF-8 to 4 bytes (from 6)
to conform to definition of UTF-8. This closes issue #13590.
(This is a retry of r58954, after issue #13590 has been addressed.)

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58965 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-30 05:43:41 +00:00
duerst e07bff3ce3 revert r58954 temporarily
Revert change to maximum of 4 bytes for UTF-8 characters at r58954 temporarily.
This failed spec at https://travis-ci.org/ruby/ruby/builds/237086017, but it
is totally unclear why.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58955 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-29 08:59:41 +00:00
duerst a03690ae73 Change max byte length of UTF-8 to 4 bytes
In enc/utf_8.c, change maximum byte length of UTF-8 to 4 bytes (from 6)
to conform to definition of UTF-8. This closes issue #13590.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58954 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-29 08:41:23 +00:00
duerst f86766970f delete enc/prelude.rb, because no longer needed
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58579 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-05-06 04:42:58 +00:00
nobu d97f700938 clean autogenerated files
* enc/depend (clean, clean-srcs): fix path of name2ctype.h, and
  remove casefold.h too.

* enc/jis/props.h: autogenerated file.
  [ruby-core:80823] [Bug #13493]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58438 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-21 23:16:43 +00:00
nobu ae73dc367a enc/depend: remove Unicode versions
* enc/depend (enc/unicode.o): remove hardcoded Unicode versions.
  this object file must be compiled by toplevel make.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58386 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 02:58:44 +00:00
normal d6a6971212 fix ext/-test-/struct/ dependencies
I started writing a template for auto-generation and
let "tool/update-deps --fix" fill in the rest.

Hopefully this fixes problems with some CI builds
after r58359.  Further changes to other ext/-test-/
files should probably add or update "depend" files, too.

* ext/-test-/struct/depend: new file
* enc/depend: auto-updated with unicode 9.0.0 headers (side-effect)

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58364 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-15 07:13:05 +00:00
nobu 8083a359d0 enc-unicode.rb: uniname2ctype_offset
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58065 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-23 07:59:56 +00:00
nobu 12b8058661 update name2ctype.h
* enc/unicode/9.0.0/name2ctype.h: update due to merger of Onigmo
  6.0.0.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58064 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-23 07:53:35 +00:00
shyouhei 20c72dc89d ruby tool/update-deps --fix
Onigumo 6 (r57045) introduced new onigumo.h header file, which is
required from quite much everywhere.  This commit adds necessary
dependencies.

Note: ruby/oniguruma.h now includes onigumo.h,
      ruby/io.h includes oniguruma.h,
      ruby/encoding.h also includes oniguruma.h,
      and internal.h includes encoding.h.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58054 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-22 06:00:18 +00:00
nobu 4171ed6c21 fix UTF-32 valid_encoding?
* enc/utf_32be.c (utf32be_mbc_enc_len): check arguments precisely.
  [ruby-core:79966] [Bug #13292]

* enc/utf_32le.c (utf32le_mbc_enc_len): ditto.

* regenc.h (UNICODE_VALID_CODEPOINT_P): predicate for valid
  Unicode codepoints.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57816 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-03-09 02:04:10 +00:00
naruse 2873edeafb Merge Onigmo 6.0.0
* https://github.com/k-takata/Onigmo/blob/Onigmo-6.0.0/HISTORY
* fix for ruby 2.4: https://github.com/k-takata/Onigmo/pull/78
* suppress warning: https://github.com/k-takata/Onigmo/pull/79
* include/ruby/oniguruma.h: include onigmo.h.
* template/encdb.h.tmpl: ignore duplicated definition of EUC-CN in
  enc/euc_kr.c. It is defined in enc/gb2313.c with CRuby macro.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57045 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-10 17:47:04 +00:00