* tool/gperf.sed: extracted sed commands to a script. ANSI-C code
produced by gperf 3.1 declares length arguments as `size_t`. it
causes conflict with existing declarations, and needs casts for
a local variable and return statements.
[Feature #13883]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61076 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* common.mk: download emoji-data.txt. As emoji data files are
located in a separate directory in Unicode.org site, reearranged
Unicode data files directories same as the site.
* tool/enc-unicode.rb (get_file): search emoji data files in the
second argument path.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60977 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tool/enc-unicode.rb: support for gperf 3.1, which defines length
arguments as `size_t` but a local variable as `unsigned int`.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60976 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tool/enc-unicode.rb (data_foreach): version comments do not
include sub directory names.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58070 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tool/downloader.rb: download to the file given in ARGV.
* tool/enc-unicode.rb (parse_GraphemeBreakProperty): fix data file
path as $(UNICODE_PROPERTY_FILES) in common.mk.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58069 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* meta character \X matches Unicode 9.0.0 characters with some workarounds
for UTR #51 Unicode Emoji, Version 4.0 emoji zwj sequences.
[Feature #12831] [ruby-core:77586]
The term "character" can have many meanings bytes, codepoints, combined
characters, and so on. "grapheme cluster" is highest one of such words,
which means user-perceived characters.
Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION specifies how to
handle grapheme clusters (extended grapheme cluster).
But some specs aren't updated to current situation because Unicode Emoji
is rapidly extended without well definition.
It breaks the precondition of UTR#29 "Grapheme cluster boundaries can be
easily tested by looking at immediately adjacent characters". (the
sentence will be removed in the next version)
Though some of its detail are described in Unicode Technical Report #51
UNICODE EMOJI but it is not merged into UTR#29 yet.
http://unicode.org/reports/tr29/http://unicode.org/reports/tr51/http://unicode.org/Public/emoji/4.0/
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56949 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode/case-folding.rb, tool/enc-unicode.rb: check if
Unicode versions are consistent with each other.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55687 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tool/enc-unicode.rb (data_foreach): check Unicode version in
data files, and yield each lines.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55685 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tool/enc-unicode.rb: add comment why it uses Hash#index.
* enc/unicode/{name2ctype.kwd,name2ctype.src,name2ctype.h.blt}:
update to follow the current name2ctype.h.
FYI current Unicode version is 6.1.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36070 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
script should work with ruby 1.8.
* tool/enc-unicode.rb: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@34650 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
enc/unicode/name2ctype.h, enc/unicode/name2ctype.h.blt,
enc/unicode/name2ctype.kwd, enc/unicode/name2ctype.src:
Add Age property to regexp. [ruby-core:33019]
patched by Ammar Ali, tested by Run Paint Run Run
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29717 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
enc/unicode/name2ctype.h, enc/unicode/name2ctype.h.blt,
enc/unicode/name2ctype.kwd, enc/unicode/name2ctype.src:
Update Oniguruma for Unicode 6.
patched by Run Paint Run Run. [ruby-core:32923] #3989
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29620 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
enc/unicode/name2ctype.h, enc/unicode/name2ctype.h.blt,
enc/unicode/name2ctype.kwd, enc/unicode/name2ctype.src:
use UTS#18 for POSIX character class.
http://rubyspec.org/issues/show/161
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@25338 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tool/enc-unicode.rb,
enc/unicode/name2ctype.h, enc/unicode/name2ctype.h.blt,
enc/unicode/name2ctype.kwd, enc/unicode/name2ctype.src:
Add DerivedCoreProperties, PropList (Binary Property),
PropertyAlias and PropertyValueAlias.
Now users of tool/enc-unicode.rb should specify
the directory of UCD files.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@25324 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tool/enc-unicode.rb: added for generate name2ctype.kwd.
contributed by Run Paint Run Run [ruby-core:24775]
use like following:
ruby19 tool/enc-unicode.rb enc/unicode/UnicodeData.txt \
enc/unicode/Scripts.txt > enc/unicode/name2ctype.kwd
* enc/unicode.c (CodeRanges): move definitions to name2ctype.h.
* enc/unicode/name2ctype.h.blt, enc/unicode/name2ctype.kwd,
enc/unicode/name2ctype.src: updated to v5.1.
* enc/unicode/UnicodeData.txt, enc/unicode/Scripts.txt: added v5.1.
* Makefile.in: add rule to generate name2ctype.kwd from
UnicodeData.txt and Scripts.txt.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24651 b2dd03c8-39d4-4d8f-98ff-823fe69b080e