Граф коммитов

147 Коммитов

Автор SHA1 Сообщение Дата
duerst 456a696f72 remove code duplication and streamline identifiers
In regparse.c:

* Reduce coode duplication by merging the almost identical functions
  create_sequence_node and create_alternate_node into a new function
  create_node_from_array, adding a parameter that distinguishes  between
  creating a list and creating an alternative.

* Streamline variable/function naming. Unicode UAX #29 uses 'sequence', but
  the regular expression library uses 'list' for the same concept. Keep
  'sequence' in the ccmments that are taken from UAX #29, but use 'list'
  in variable names.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66234 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06 04:46:57 +00:00
duerst e824e21beb remove obsolete data from unicode.c
* unicode.c: Remove the arrays onigenc_unicode_GCB_ranges_GAZ,
  onigenc_unicode_GCB_ranges_E_Base, and onigenc_unicode_GCB_ranges_Emoji,
  because they are not needed anymore for Unicode 11.0.0.

* regparse.c: Remove external declarations for above arrays.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66232 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06 00:05:08 +00:00
duerst 6cf3ada55c remove unused variables in node_extended_grapheme_cluster()
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66218 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-05 10:40:17 +00:00
duerst 6e97c12e49 tweak/remove comments [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66217 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-05 10:11:57 +00:00
duerst 3913092624 adjust some comments in node_extended_grapheme_cluster() [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66214 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-05 09:00:40 +00:00
duerst 66a6073859 update to Unicode 11.0.0 (main step, not complete yet)
- common.mk: Change Unicode version to 11.0.0, and Emoji version to 11.0
- test/ruby/enc/test_emoji_breaks.rb: update hard-coded Emoji version
- enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h:
  Add generated files. Files for Unicode 10.0.0 will be removed once we are
  sure 11.0.0 works.
- lib/unicode_normalize/tables.rb: Updated table.
- regparse.c: Almost completely reimplement grapheme cluster detection in
  function node_extended_grapheme_cluster().


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-05 08:10:24 +00:00
duerst b56e266d64 remove unnecessary settings with NULL_NODE in \X implementation
Remove unnecessary settings of node_array elements to NULL_NODE.
We can do this because we initialize the whole array to NULL_NODEs
and set everything again to NULL_NODEs when creating a sequence or
alternative node.

Also, fix an index error in the initialization of node_array.
(issue #15343)

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66139 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-02 23:28:42 +00:00
duerst 4a8531db5d fix order of declarations and code at start of node_extended_grapheme_cluster()
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66138 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-02 22:05:36 +00:00
ko1 d80bf2f1e7 fix last commit (r66135)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66137 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-02 22:01:29 +00:00
duerst f43a2a5a49 make sure all nodes are freed on error in node_extended_grapheme_cluster()
regparse.c: In function node_extended_grapheme_cluster(), introduce function-global
array node_array and use it for sequence and alternate construction. This is done
so that in case of error, all nodes that have already been constructed can be
correctly freed. (issue #15343)

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66135 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-02 21:41:50 +00:00
duerst 1752d13827 expand a small comment [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66132 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-02 10:00:35 +00:00
duerst 096d362939 add/change some comments in node_extended_grapheme_cluster() [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66123 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-02 01:33:24 +00:00
duerst d359ee3f55 reformat code [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66122 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-02 01:06:28 +00:00
duerst d1f8694171 remove unnecessary code removing CR/LF from range
Remove code that tries to remove CR and LF from Grapheme_Cluster_Break=Control.
This code is unnecessary because Grapheme_Cluster_Break=Control already excludes
CR and LF.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66116 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-01 08:26:39 +00:00
svn 36b3e41a57 * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66115 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-01 07:31:35 +00:00
duerst 3feeed6e99 introduce and use create_alternate_node()
Introduce new function create_alternate_node() to create an alternative node
from a list of nodes in one go. Use it once (two more uses expected).

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66114 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-01 07:31:34 +00:00
duerst 1fa7087f10 eliminate a list with only one element
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66113 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-01 06:07:53 +00:00
duerst c80aeb527e remove two unnecessary variables (np2 and np3)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-28 07:07:59 +00:00
duerst d3b7a10dcc eliminate intermediate variable in very short block (3 times)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66071 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-28 06:43:57 +00:00
duerst 33c7fa6501 use create_sequence_node() four more times
Four more use of create_sequence_node() in node_extended_grapheme_cluster
(a few more to come).

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66070 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-28 06:21:52 +00:00
duerst b4e39021f2 use create_sequence_node() once more
One more use of create_sequence_node() in node_extended_grapheme_cluster
(several more to come).

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66063 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-28 01:58:35 +00:00
duerst 97a8334cd3 introduce macro R_ERR to reduce repetitive code
Introduce a new preprocessor macro R_ERR to visually reduce repetitive code
checking for return values and going to the err: label at the end of the
function node_extended_grapheme_cluster().

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66057 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-28 00:53:35 +00:00
duerst 42cb4feda1 reduce number of arguments on quantify_property_node()
There are only four patterns of the last two arguments to quantify_property_node().
By replacing the lower/upper arguments with a single char, we get more expressive
calls, the last argument directly corresponding to the quantifier that we want to
use (except for '2', which means exactly two).

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66052 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-28 00:28:52 +00:00
duerst 7c4a422d83 fix order of subexpressions for Hangul
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66048 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 21:50:27 +00:00
svn 793c48b4ce * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66047 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 21:38:55 +00:00
duerst 06bd42a722 introduce two more uses of create_sequence_node in node_extended_grapheme_cluster
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66046 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 21:38:53 +00:00
duerst 99d451f5c5 correctly handle return value from create_sequence_node()
In function node_extended_grapheme_cluster(), store and test
return value from create_sequence_node(). Never forget this!

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66045 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 21:23:11 +00:00
svn 87cda89ca7 * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 21:12:31 +00:00
duerst 8186889120 declare array for sequence at start of code creating sequence
In function node_extended_grapheme_cluster(),
move declaration up so that block encompasses all of the regular expression
creation that finally makes up the sequence. Having blocks like this will
be great because it directly shows the extent of code belonging to each
subexpression of the regular expression being created.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66043 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 21:12:29 +00:00
duerst 7086aae378 make sure all nodes are correctly freed in create_property_node()
We make sure that the newly created tree and all remaining nodes passed in
in the node_array are freed.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66042 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 21:00:06 +00:00
k0kubun c3fe307808 regparse.c: conform C90
../regparse.c:5908:28: error: initializer for aggregate is not a compile-time constant [-Werror,-Wc99-extensions]
      Node* sequence[] = { np1, np2, np3, ((Node* )0) };
                           ^~~

https://travis-ci.org/ruby/ruby/jobs/460197620

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66034 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 11:24:19 +00:00
duerst 9e2455b756 introduce helper function create_sequence_node()
The new function create_sequence_node() uses its second argument
(an array of Node*, from left to right, ending with NULL_NODE)
to create a sequence of expressions using node_new_list().

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66033 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 11:04:02 +00:00
svn 231930ca7d * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66032 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 10:15:46 +00:00
duerst 8f9c00d207 introduce helper function quantify_property_node()
The new function quantify_property_node() combines the functions
create_property_node() and quantify_node(), which frequently appear together.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66031 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 10:15:45 +00:00
duerst 69443998cd introduce helper function quantify_node() to wrap function node_new_quantifier
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66030 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 09:44:35 +00:00
duerst aa44935969 use explicit property name when creating nodes for "Grapheme_Cluster_Break=Extend"
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66021 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 04:13:21 +00:00
duerst b62e466fb5 use 'Regional_Indicator' script property instead of fixed constants
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66020 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 03:56:19 +00:00
duerst 2e07575914 add some comments in function node_extended_grapheme_cluster() [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66014 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-27 02:18:29 +00:00
duerst 9a4d120736 create function create_property_node to extract recurring functionality
Refactoring: In regparse.c, extract creation of a new CClass node and
initialization using a property into a new function create_property_node().


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65972 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-11-25 11:02:41 +00:00
nobu 6e9fc98d19 regparse.c: check the result of propname2ctype
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65094 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-16 08:13:58 +00:00
nobu e07c5baf66 unicode.c: moved addtional GCB ranges
* enc/unicode.c: moved additional Grapheme Cluster Break ranges
  which depend on the Unicode version.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-15 13:48:20 +00:00
nobu 179045acaf regparse.c: Suppress duplicated range warning by mere \X
* regparse.c (node_extended_grapheme_cluster): as Unicode 10 has
  added Grapheme_Cluster_Break properties to some characters,
  remove duplicated ranges for Unicode 9.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65086 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-15 12:31:25 +00:00
nobu f31c5e72b2 regparse.c: warn all duplicated ranges when debugging
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65085 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-15 12:31:22 +00:00
hsbt ef9bc609db Fix typos.
* rememberd -> remembered
  * refered -> referred

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61933 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-18 11:44:10 +00:00
naruse 31796f17d3 Update to Onigmo 6.1.3-669ac9997619954c298da971fcfacccf36909d05.
[Bug #13892]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60966 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-12-01 13:50:13 +00:00
nobu ea940cc4dc regparse.c: initialize return values
* regparse.c (parse_char_class): initialize return values before
  depth limit check.  returned values will be freed in callers
  regardless the error.  [ruby-core:79624] [Bug #13234]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57660 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-20 09:46:12 +00:00
naruse 6b1c6e0e55 Merge Onigmo 6.1.1
* Support absent operator https://github.com/k-takata/Onigmo/issues/82
* https://github.com/k-takata/Onigmo/blob/Onigmo-6.1.1/HISTORY

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57603 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-11 15:08:33 +00:00
naruse 2873edeafb Merge Onigmo 6.0.0
* https://github.com/k-takata/Onigmo/blob/Onigmo-6.0.0/HISTORY
* fix for ruby 2.4: https://github.com/k-takata/Onigmo/pull/78
* suppress warning: https://github.com/k-takata/Onigmo/pull/79
* include/ruby/oniguruma.h: include onigmo.h.
* template/encdb.h.tmpl: ignore duplicated definition of EUC-CN in
  enc/euc_kr.c. It is defined in enc/gb2313.c with CRuby macro.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57045 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-12-10 17:47:04 +00:00
naruse c11e648799 Regexp supports Unicoe 9.0.0's \X
* meta character \X matches Unicode 9.0.0 characters with some workarounds
  for UTR #51 Unicode Emoji, Version 4.0 emoji zwj sequences.
  [Feature #12831] [ruby-core:77586]

The term "character" can have many meanings bytes, codepoints, combined
characters, and so on. "grapheme cluster" is highest one of such words,
which means user-perceived characters.
Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION specifies how to
handle grapheme clusters (extended grapheme cluster).
But some specs aren't updated to current situation because Unicode Emoji
is rapidly extended without well definition.
It breaks the precondition of UTR#29 "Grapheme cluster boundaries can be
easily tested by looking at immediately adjacent characters". (the
sentence will be removed in the next version)
Though some of its detail are described in Unicode Technical Report #51
UNICODE EMOJI but it is not merged into UTR#29 yet.

http://unicode.org/reports/tr29/
http://unicode.org/reports/tr51/
http://unicode.org/Public/emoji/4.0/

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56949 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-11-30 17:29:19 +00:00
naruse 05c631eefd * regparse.c (fetch_token_in_cc): raise error if given octal escaped
character is too big. [Bug #12420] [Bug #12423]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55163 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-05-25 09:45:22 +00:00