github/ruby - ruby

Граф коммитов

Автор	SHA1	Сообщение	Дата
Peter Zhu	37ed86fd3c	Fix memory leak in regexp grapheme clusters [Bug #20161] The cc->mbuf gets overwritten, so we need to free it to not leak memory. For example: str = "hello world".encode(Encoding::UTF_32LE) 10.times do 1_000.times do str.grapheme_clusters end puts `ps -o rss= -p #{$$}` end Before: 15536 15760 15920 16144 16304 16480 16640 16784 17008 17280 After: 15584 15584 15760 15824 15888 15888 15888 15888 16048 16112	2024-01-08 19:50:34 -05:00
Adam Hess	f694bd158c	Improve error and memory handling Apply Nobu's suggestions which improve style, memory handling and error correction. Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>	2023-11-08 08:05:58 -05:00
Adam Hess	05cde4155c	fix regex from regex memory corruption before this change, creating a regex from a regex with a named capture, Regexp.new(/(?<name>)/), causes memory to be shared between the two named capture groups which can cause a segfault if the original is GCed.	2023-11-08 08:05:58 -05:00
Nobuyoshi Nakada	5cff4c5aa3	Fix onigmo name table without st Co-authored-by: Adam Hess <HParker@github.com>	2023-11-03 10:41:48 +09:00
Nobuyoshi Nakada	4218e913d8	Fix functions for name tables as `st_foreach_callback_func`	2023-11-02 15:00:39 +09:00
Peter Zhu	58386814a7	Don't check for null pointer in calls to free According to the C99 specification section 7.20.3.2 paragraph 2: > If ptr is a null pointer, no action occurs. So we do not need to check that the pointer is a null pointer.	2023-06-30 09:13:31 -04:00
Yusuke Endoh	1d2d25dcad	Prevent potential buffer overrun in onigmo A code pattern `p + enclen(enc, p, pend)` may lead to a buffer overrun if incomplete bytes of a UTF-8 character is placed at the end of a string. Because this pattern is used in several places in onigmo, this change fixes the issue in the side of `enclen`: the function should not return a number that is larger than `pend - p`. Co-Authored-By: Nobuyoshi Nakada <nobu@ruby-lang.org>	2022-10-25 17:02:43 +09:00
Yusuke Endoh	902e459b73	Prevent buffer overrun in regparse.c A regexp that ends with an escape following an incomplete UTF-8 char might cause buffer overrun. Found by OSS-Fuzz. ``` $ valgrind ./miniruby -e 'Regexp.new("\\u2d73\\0\\0\\0\\0 \\\xE6".b)' ==296213== Memcheck, a memory error detector ==296213== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==296213== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info ==296213== Command: ./miniruby -e Regexp.new("\\\\u2d73\\\\0\\\\0\\\\0\\\\0\ \ \ \ \ \ \ \ \ \ \\\\\\xE6".b) ==296213== ==296213== Warning: client switching stacks? SP change: 0x1ffe8020e0 --> 0x1ffeffff10 ==296213== to suppress, use: --max-stackframe=8379952 or greater ==296213== Invalid read of size 1 ==296213== at 0x484EA10: memmove (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==296213== by 0x339568: memcpy (string_fortified.h:29) ==296213== by 0x339568: onig_strcpy (regparse.c:271) ==296213== by 0x339568: onig_node_str_cat (regparse.c:1413) ==296213== by 0x33CBA0: parse_exp (regparse.c:6198) ==296213== by 0x33EDE4: parse_branch (regparse.c:6511) ==296213== by 0x33EEA2: parse_subexp (regparse.c:6544) ==296213== by 0x34019C: parse_regexp (regparse.c:6593) ==296213== by 0x34019C: onig_parse_make_tree (regparse.c:6638) ==296213== by 0x32782D: onig_compile_ruby (regcomp.c:5779) ==296213== by 0x313EFA: onig_new_with_source (re.c:876) ==296213== by 0x313EFA: make_regexp (re.c:900) ==296213== by 0x313EFA: rb_reg_initialize (re.c:3136) ==296213== by 0x318555: rb_reg_initialize_str (re.c:3170) ==296213== by 0x318555: rb_reg_init_str (re.c:3205) ==296213== by 0x31A669: rb_reg_initialize_m (re.c:3856) ==296213== by 0x3E5165: vm_call0_cfunc_with_frame (vm_eval.c:150) ==296213== by 0x3E5165: vm_call0_cfunc (vm_eval.c:164) ==296213== by 0x3E5165: vm_call0_body (vm_eval.c:210) ==296213== by 0x3E89BD: vm_call0_cc (vm_eval.c:87) ==296213== by 0x3E89BD: rb_call0 (vm_eval.c:551) ==296213== Address 0x9d45b10 is 0 bytes after a block of size 32 alloc'd ==296213== at 0x4844899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==296213== by 0x20FA7B: objspace_xmalloc0 (gc.c:12146) ==296213== by 0x35F8C9: str_buf_cat4.part.0 (string.c:3132) ==296213== by 0x31359D: unescape_escaped_nonascii (re.c:2690) ==296213== by 0x313A9D: unescape_nonascii (re.c:2869) ==296213== by 0x313A9D: rb_reg_preprocess (re.c:2992) ==296213== by 0x313DFC: rb_reg_initialize (re.c:3109) ==296213== by 0x318555: rb_reg_initialize_str (re.c:3170) ==296213== by 0x318555: rb_reg_init_str (re.c:3205) ==296213== by 0x31A669: rb_reg_initialize_m (re.c:3856) ==296213== by 0x3E5165: vm_call0_cfunc_with_frame (vm_eval.c:150) ==296213== by 0x3E5165: vm_call0_cfunc (vm_eval.c:164) ==296213== by 0x3E5165: vm_call0_body (vm_eval.c:210) ==296213== by 0x3E89BD: vm_call0_cc (vm_eval.c:87) ==296213== by 0x3E89BD: rb_call0 (vm_eval.c:551) ==296213== by 0x3E957B: rb_call (vm_eval.c:877) ==296213== by 0x3E957B: rb_funcallv_kw (vm_eval.c:1074) ==296213== by 0x2A4123: rb_class_new_instance_pass_kw (object.c:1991) ==296213== ==296213== ==296213== HEAP SUMMARY: ==296213== in use at exit: 35,476,538 bytes in 9,489 blocks ==296213== total heap usage: 14,893 allocs, 5,404 frees, 37,517,821 bytes allocated ==296213== ==296213== LEAK SUMMARY: ==296213== definitely lost: 316,081 bytes in 2,989 blocks ==296213== indirectly lost: 136,808 bytes in 2,361 blocks ==296213== possibly lost: 1,048,624 bytes in 3 blocks ==296213== still reachable: 33,975,025 bytes in 4,136 blocks ==296213== suppressed: 0 bytes in 0 blocks ==296213== Rerun with --leak-check=full to see details of leaked memory ==296213== ==296213== For lists of detected and suppressed errors, rerun with: -s ==296213== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) ```	2022-10-25 13:20:25 +09:00
Kevin Backhouse	8c1808151f	Fix some UBSAN false positives (#6115 ) * Fix some UBSAN false positives. * ruby tool/update-deps --fix	2022-07-12 11:48:10 -07:00
Yusuke Endoh	f44547c999	regparse.c: Suppress false-positive warnings of GCC 12.1 http://rubyci.s3.amazonaws.com/arch/ruby-master/log/20220613T030003Z.log.html.gz ``` regparse.c:264:15: warning: array subscript 56 is outside array bounds of ‘Node[1]’ {aka ‘struct _Node[1]’} [-Warray-bounds] ``` and ``` /usr/include/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ pointer overflow between offset 32 and size [9223372036854775792, 9223372036854775807] [-Warray-bounds] ```	2022-06-21 11:32:02 +09:00
Nobuyoshi Nakada	efa0c31ce5	Add printf-style format attribute to oniguruma functions Also make the format string compatible with literal strings which are const arrays of "plain" chars.	2021-09-27 19:02:45 +09:00
Jeremy Evans	9e73177d53	Do not reduce quantifiers if it affects which text will be matched Quantifier reduction when using +?)* and +?)+ should not be done as it affects which text will be matched. This removes the need for the RQ_PQ_Q ReduceType, so remove the enum entry and related switch case. Test that these are the only two patterns affected by testing all quantifier reduction tuples for both the captured and uncaptured cases and making sure the matched text is the same for both. Fixes [Bug #17341]	2020-12-02 18:42:02 +01:00
Jeremy Evans	b26d6c70e0	Detect the premature end of char property in regexp Default to ONIGERR_INVALID_CHAR_PROPERTY_NAME in fetch_char_property_to_ctype and only set otherwise if an ending } is found. Fixes [Bug #17340]	2020-11-24 16:01:30 +01:00
Nobuyoshi Nakada	db16629008	Fixed misspellings Fixed misspellings reported at [Bug #16437], only in ruby and rubyspec.	2019-12-20 09:32:42 +09:00
卜部昌平	6dd60cf114	st_foreach now free from ANYARGS After `5e86b005c0`, I now think ANYARGS is dangerous and should be extinct. This commit deletes ANYARGS from st_foreach. I strongly believe that this commit should have had come with `b0af0592fd`, which added extra parameter to st_foreach callbacks.	2019-08-27 15:52:26 +09:00
Nobuyoshi Nakada	2f6cc15cdb	Fixed String#grapheme_clusters with wide encodings * string.c (get_reg_grapheme_cluster): make regexp from properly encoded sources fro wide-char encodings. [Bug #15965] * regparse.c (node_extended_grapheme_cluster): suppress false duplicated range warning for the time being.	2019-06-29 10:10:17 +09:00
duerst	9bb2da28d5	convert check for array length to assertion and comment out In regparse.c, in function node_extended_grapheme_cluster, we used a raw if() with exit(1) as a cross-check for our length calculations for the common node array. Convert this to an assertion and comment it out because it is not needed for active code. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66269 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-07 07:20:26 +00:00
duerst	7780553c30	remove code duplication and put everything into forward order In file regparse.c, in function node_extended_grapheme_cluster(), eliminate code duplication of CRLF and '.' (any character). This uses the fact that both for Unicode encodings and for non-Unicode encodings, the first alternative is CRLF, and the last alternative is '.' (any character). This puts all of the pieces into forward order (the order of the code follows the order of the syntax definition). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66267 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-07 07:04:00 +00:00
duerst	9d17024095	remove an unused variable git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66240 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-06 09:54:09 +00:00
duerst	c114e275c9	make sure all nodes are freed on error in node_extended_grapheme_cluster() regparse.c: In function node_extended_grapheme_cluster(), use function-global array node_common and use it for list and alternate construction. This is done so that in case of error, all nodes that have already been constructed can be correctly freed in a single for loop. Document the layout structure. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-06 09:16:43 +00:00
duerst	456a696f72	remove code duplication and streamline identifiers In regparse.c: * Reduce coode duplication by merging the almost identical functions create_sequence_node and create_alternate_node into a new function create_node_from_array, adding a parameter that distinguishes between creating a list and creating an alternative. * Streamline variable/function naming. Unicode UAX #29 uses 'sequence', but the regular expression library uses 'list' for the same concept. Keep 'sequence' in the ccmments that are taken from UAX #29, but use 'list' in variable names. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66234 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-06 04:46:57 +00:00
duerst	e824e21beb	remove obsolete data from unicode.c * unicode.c: Remove the arrays onigenc_unicode_GCB_ranges_GAZ, onigenc_unicode_GCB_ranges_E_Base, and onigenc_unicode_GCB_ranges_Emoji, because they are not needed anymore for Unicode 11.0.0. * regparse.c: Remove external declarations for above arrays. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66232 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-06 00:05:08 +00:00
duerst	6cf3ada55c	remove unused variables in node_extended_grapheme_cluster() git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66218 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-05 10:40:17 +00:00
duerst	6e97c12e49	tweak/remove comments [ci skip] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66217 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-05 10:11:57 +00:00
duerst	3913092624	adjust some comments in node_extended_grapheme_cluster() [ci skip] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66214 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-05 09:00:40 +00:00
duerst	66a6073859	update to Unicode 11.0.0 (main step, not complete yet) - common.mk: Change Unicode version to 11.0.0, and Emoji version to 11.0 - test/ruby/enc/test_emoji_breaks.rb: update hard-coded Emoji version - enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h: Add generated files. Files for Unicode 10.0.0 will be removed once we are sure 11.0.0 works. - lib/unicode_normalize/tables.rb: Updated table. - regparse.c: Almost completely reimplement grapheme cluster detection in function node_extended_grapheme_cluster(). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-05 08:10:24 +00:00
duerst	b56e266d64	remove unnecessary settings with NULL_NODE in \X implementation Remove unnecessary settings of node_array elements to NULL_NODE. We can do this because we initialize the whole array to NULL_NODEs and set everything again to NULL_NODEs when creating a sequence or alternative node. Also, fix an index error in the initialization of node_array. (issue #15343) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66139 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-02 23:28:42 +00:00
duerst	4a8531db5d	fix order of declarations and code at start of node_extended_grapheme_cluster() git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66138 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-02 22:05:36 +00:00
ko1	d80bf2f1e7	fix last commit (r66135) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66137 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-02 22:01:29 +00:00
duerst	f43a2a5a49	make sure all nodes are freed on error in node_extended_grapheme_cluster() regparse.c: In function node_extended_grapheme_cluster(), introduce function-global array node_array and use it for sequence and alternate construction. This is done so that in case of error, all nodes that have already been constructed can be correctly freed. (issue #15343) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66135 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-02 21:41:50 +00:00
duerst	1752d13827	expand a small comment [ci skip] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66132 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-02 10:00:35 +00:00
duerst	096d362939	add/change some comments in node_extended_grapheme_cluster() [ci skip] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66123 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-02 01:33:24 +00:00
duerst	d359ee3f55	reformat code [ci skip] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66122 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-02 01:06:28 +00:00
duerst	d1f8694171	remove unnecessary code removing CR/LF from range Remove code that tries to remove CR and LF from Grapheme_Cluster_Break=Control. This code is unnecessary because Grapheme_Cluster_Break=Control already excludes CR and LF. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66116 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-01 08:26:39 +00:00
svn	36b3e41a57	* remove trailing spaces. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66115 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-01 07:31:35 +00:00
duerst	3feeed6e99	introduce and use create_alternate_node() Introduce new function create_alternate_node() to create an alternative node from a list of nodes in one go. Use it once (two more uses expected). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66114 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-01 07:31:34 +00:00
duerst	1fa7087f10	eliminate a list with only one element git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66113 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-12-01 06:07:53 +00:00
duerst	c80aeb527e	remove two unnecessary variables (np2 and np3) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-11-28 07:07:59 +00:00
duerst	d3b7a10dcc	eliminate intermediate variable in very short block (3 times) git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66071 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-11-28 06:43:57 +00:00
duerst	33c7fa6501	use create_sequence_node() four more times Four more use of create_sequence_node() in node_extended_grapheme_cluster (a few more to come). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66070 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-11-28 06:21:52 +00:00
duerst	b4e39021f2	use create_sequence_node() once more One more use of create_sequence_node() in node_extended_grapheme_cluster (several more to come). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66063 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-11-28 01:58:35 +00:00
duerst	97a8334cd3	introduce macro R_ERR to reduce repetitive code Introduce a new preprocessor macro R_ERR to visually reduce repetitive code checking for return values and going to the err: label at the end of the function node_extended_grapheme_cluster(). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66057 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-11-28 00:53:35 +00:00
duerst	42cb4feda1	reduce number of arguments on quantify_property_node() There are only four patterns of the last two arguments to quantify_property_node(). By replacing the lower/upper arguments with a single char, we get more expressive calls, the last argument directly corresponding to the quantifier that we want to use (except for '2', which means exactly two). git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66052 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-11-28 00:28:52 +00:00
duerst	7c4a422d83	fix order of subexpressions for Hangul git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66048 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-11-27 21:50:27 +00:00
svn	793c48b4ce	* remove trailing spaces. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66047 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-11-27 21:38:55 +00:00
duerst	06bd42a722	introduce two more uses of create_sequence_node in node_extended_grapheme_cluster git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66046 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-11-27 21:38:53 +00:00
duerst	99d451f5c5	correctly handle return value from create_sequence_node() In function node_extended_grapheme_cluster(), store and test return value from create_sequence_node(). Never forget this! git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66045 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-11-27 21:23:11 +00:00
svn	87cda89ca7	* remove trailing spaces. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-11-27 21:12:31 +00:00
duerst	8186889120	declare array for sequence at start of code creating sequence In function node_extended_grapheme_cluster(), move declaration up so that block encompasses all of the regular expression creation that finally makes up the sequence. Having blocks like this will be great because it directly shows the extent of code belonging to each subexpression of the regular expression being created. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66043 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-11-27 21:12:29 +00:00
duerst	7086aae378	make sure all nodes are correctly freed in create_property_node() We make sure that the newly created tree and all remaining nodes passed in in the node_array are freed. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66042 b2dd03c8-39d4-4d8f-98ff-823fe69b080e	2018-11-27 21:00:06 +00:00

1 2 3 4

167 Коммитов