Four more use of create_sequence_node() in node_extended_grapheme_cluster
(a few more to come).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66070 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
One more use of create_sequence_node() in node_extended_grapheme_cluster
(several more to come).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66063 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Introduce a new preprocessor macro R_ERR to visually reduce repetitive code
checking for return values and going to the err: label at the end of the
function node_extended_grapheme_cluster().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66057 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
There are only four patterns of the last two arguments to quantify_property_node().
By replacing the lower/upper arguments with a single char, we get more expressive
calls, the last argument directly corresponding to the quantifier that we want to
use (except for '2', which means exactly two).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66052 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
In function node_extended_grapheme_cluster(), store and test
return value from create_sequence_node(). Never forget this!
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66045 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
In function node_extended_grapheme_cluster(),
move declaration up so that block encompasses all of the regular expression
creation that finally makes up the sequence. Having blocks like this will
be great because it directly shows the extent of code belonging to each
subexpression of the regular expression being created.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66043 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
We make sure that the newly created tree and all remaining nodes passed in
in the node_array are freed.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66042 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The new function create_sequence_node() uses its second argument
(an array of Node*, from left to right, ending with NULL_NODE)
to create a sequence of expressions using node_new_list().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66033 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The new function quantify_property_node() combines the functions
create_property_node() and quantify_node(), which frequently appear together.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66031 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Refactoring: In regparse.c, extract creation of a new CClass node and
initialization using a property into a new function create_property_node().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65972 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode.c: moved additional Grapheme Cluster Break ranges
which depend on the Unicode version.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* regparse.c (node_extended_grapheme_cluster): as Unicode 10 has
added Grapheme_Cluster_Break properties to some characters,
remove duplicated ranges for Unicode 9.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65086 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* regparse.c (parse_char_class): initialize return values before
depth limit check. returned values will be freed in callers
regardless the error. [ruby-core:79624] [Bug #13234]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57660 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* meta character \X matches Unicode 9.0.0 characters with some workarounds
for UTR #51 Unicode Emoji, Version 4.0 emoji zwj sequences.
[Feature #12831] [ruby-core:77586]
The term "character" can have many meanings bytes, codepoints, combined
characters, and so on. "grapheme cluster" is highest one of such words,
which means user-perceived characters.
Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION specifies how to
handle grapheme clusters (extended grapheme cluster).
But some specs aren't updated to current situation because Unicode Emoji
is rapidly extended without well definition.
It breaks the precondition of UTR#29 "Grapheme cluster boundaries can be
easily tested by looking at immediately adjacent characters". (the
sentence will be removed in the next version)
Though some of its detail are described in Unicode Technical Report #51
UNICODE EMOJI but it is not merged into UTR#29 yet.
http://unicode.org/reports/tr29/http://unicode.org/reports/tr51/http://unicode.org/Public/emoji/4.0/
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56949 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
allocate memory. This is pointed out by Facebook's Infer.
* gc.c (gc_prof_setup_new_record): ditto.
* regparse.c (parse_regexp): ditto.
* util.c (MALLOC): use xmalloc and xfree like above.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54954 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
at the first character. [Feature #11949]
* regparse.c (fetch_name): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53610 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
[bug] fix problem with optimization of \z (Issue #16) [Bug #8210]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@40276 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
range in a character class.
* test/ruby/test_regexp.rb (TestRegexp#test_char_class): fixed wrong
test.
* test/ruby/test_regexp.rb (TestRegexp#check): now can accept the
error message.
* test/ruby/test_regexp.rb
(TextRegexp#test_raw_hyphen_and_tk_char_type_after_range): renamed
because the previous name was wrong.
* test/ruby/test_regexp.rb
(TextRegexp#test_raw_hyphen_and_tk_char_type_after_range): added
more test pattern.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@37175 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* regparse.c (is_onechar_cclass): restructured to clarify that c is
used iff found == 1.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
but did emit warnings if -Wuninitialized was set. Assigning
NULL instead if pfetch_prev should suffice the situation.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36060 b2dd03c8-39d4-4d8f-98ff-823fe69b080e