Invalid escapes are handled at multiple levels. The first level
is in parse.y, so skip invalid unicode escape checks for regexps
in parse.y.
Make rb_reg_preprocess and unescape_nonascii accept the regexp
options. In unescape_nonascii, if the regexp is an extended
regexp, when "#" is encountered, ignore all characters until the
end of line or end of regexp.
Unfortunately, in extended regexps, you can use "#" as a non-comment
character inside a character class, so also parse "[" and "]"
specially for extended regexps, and only skip comments if "#" is
not inside a character class. Handle nested character classes as well.
This issue doesn't just affect extended regexps, it also affects
"(#?" comments inside all regexps. So for those comments, scan
until trailing ")" and ignore content inside.
I'm not sure if there are other corner cases not handled. A
better fix would be to redesign the regexp parser so that it
unescaped during parsing instead of before parsing, so you already
know the current parsing state.
Fixes [Bug #18294]
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Treats:
#fixed_encoding?
#hash
#==
#=~
#match
#match?
Also, in regexp.rdoc:
Changes heading from 'Special Global Variables' to 'Regexp Global Variables'.
Add tiny section 'Regexp Interpolation'.
In certain conditions, Regexp#match could return a MatchData with
missing captures. This seems to require at the least, multiple
threads calling a method that calls the same block/proc/lambda
which calls Regexp#match.
The race condition happens because the MatchData is passed from
indirectly via the backref, and other threads can modify the
backref.
Fix the issue by:
1. Not reusing the existing MatchData from the backref, and always
allocating a new MatchData.
2. Passing the MatchData directly to the caller using a VALUE*,
instead of indirectly through the backref.
It's likely that variants of this issue exist for other Regexp
methods. Anywhere that MatchData is passed implicitly through
the backref is probably vulnerable to this issue.
Fixes [Bug #17507]
Also document that both :deprecated and :experimental are supported
:category option values.
The locations where warnings were marked as deprecation warnings
was previously reviewed by shyouhei.
Comment a couple locations where deprecation warnings should probably
be used but are not currently used because deprecation warning
enablement has not occurred at the time they are called
(RUBY_FREE_MIN, RUBY_HEAP_MIN_SLOTS, -K).
Add assert_deprecated_warn to test assertions. Use this to simplify
some tests, and fix failing tests after marking some warnings with
deprecated category.
`String#sub` with a string pattern defers creating a `Regexp`
until `MatchData#regexp` creates a `Regexp` from the matched
string. `Regexp#last_match(group_name)` accessed its content
without creating the `Regexp` though. [Bug #16508]