* enc/unicode.c: moved additional Grapheme Cluster Break ranges
which depend on the Unicode version.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* regparse.c (node_extended_grapheme_cluster): as Unicode 10 has
added Grapheme_Cluster_Break properties to some characters,
remove duplicated ranges for Unicode 9.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65086 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* regparse.c (parse_char_class): initialize return values before
depth limit check. returned values will be freed in callers
regardless the error. [ruby-core:79624] [Bug #13234]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57660 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* meta character \X matches Unicode 9.0.0 characters with some workarounds
for UTR #51 Unicode Emoji, Version 4.0 emoji zwj sequences.
[Feature #12831] [ruby-core:77586]
The term "character" can have many meanings bytes, codepoints, combined
characters, and so on. "grapheme cluster" is highest one of such words,
which means user-perceived characters.
Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION specifies how to
handle grapheme clusters (extended grapheme cluster).
But some specs aren't updated to current situation because Unicode Emoji
is rapidly extended without well definition.
It breaks the precondition of UTR#29 "Grapheme cluster boundaries can be
easily tested by looking at immediately adjacent characters". (the
sentence will be removed in the next version)
Though some of its detail are described in Unicode Technical Report #51
UNICODE EMOJI but it is not merged into UTR#29 yet.
http://unicode.org/reports/tr29/http://unicode.org/reports/tr51/http://unicode.org/Public/emoji/4.0/
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56949 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
allocate memory. This is pointed out by Facebook's Infer.
* gc.c (gc_prof_setup_new_record): ditto.
* regparse.c (parse_regexp): ditto.
* util.c (MALLOC): use xmalloc and xfree like above.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54954 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
at the first character. [Feature #11949]
* regparse.c (fetch_name): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53610 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
[bug] fix problem with optimization of \z (Issue #16) [Bug #8210]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@40276 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
range in a character class.
* test/ruby/test_regexp.rb (TestRegexp#test_char_class): fixed wrong
test.
* test/ruby/test_regexp.rb (TestRegexp#check): now can accept the
error message.
* test/ruby/test_regexp.rb
(TextRegexp#test_raw_hyphen_and_tk_char_type_after_range): renamed
because the previous name was wrong.
* test/ruby/test_regexp.rb
(TextRegexp#test_raw_hyphen_and_tk_char_type_after_range): added
more test pattern.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@37175 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* regparse.c (is_onechar_cclass): restructured to clarify that c is
used iff found == 1.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
but did emit warnings if -Wuninitialized was set. Assigning
NULL instead if pfetch_prev should suffice the situation.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36060 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Merge Onigmo 27278c12e6674043cc8affca6507e20e119a86ee.
* regparse.c (is_onechar_cclass): [bug] unexpected match occurs when a
char class contains no char
* enc/unicode.c (init_case_fold_table): define the sizes of case
folding tables in casefold.h
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@34860 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
warning when given range is just before previous range.
[ruby-dev:41406]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@28013 b2dd03c8-39d4-4d8f-98ff-823fe69b080e