Граф коммитов

649 Коммитов

Автор SHA1 Сообщение Дата
Alan Wu 5a570421a5 [DOC] Regexp.last_match returns `$~`, not `$!` 2024-08-09 16:02:36 -04:00
Peter Zhu 7464514ca5 Fix memory leak in String#start_with? when regexp times out
[Bug #20653]

This commit refactors how Onigmo handles timeout. Instead of raising a
timeout error, onig_search will return a ONIGERR_TIMEOUT which the
caller can free memory, and then raise a timeout error.

This fixes a memory leak in String#start_with when the regexp times out.
For example:

    regex = Regexp.new("^#{"(a*)" * 10_000}x$", timeout: 0.000001)
    str = "a" * 1000000 + "x"

    10.times do
      100.times do
        str.start_with?(regex)
      rescue
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

    33216
    51936
    71152
    81728
    97152
    103248
    120384
    133392
    133520
    133616

After:

    14912
    15376
    15824
    15824
    16128
    16128
    16144
    16144
    16160
    16160
2024-07-26 08:42:38 -04:00
Shugo Maeda e048a073a3 Add MatchData#bytebegin and MatchData#byteend
These methods return the byte-based offset of the beginning or end of the specified match.

[Feature #20576]
2024-07-16 14:48:06 +09:00
Jean Boussier 3a7846b1aa Add a hint of `ASCII-8BIT` being `BINARY`
[Feature #18576]

Since outright renaming `ASCII-8BIT` is deemed to backward incompatible,
the next best thing would be to only change its `#inspect`, particularly
in exception messages.
2024-04-18 10:17:26 +02:00
Peter Zhu 01bfd1a2bf Fix memory leak in OnigRegion when match raises
[Bug #20228]

rb_reg_onig_match can raise a Regexp::TimeoutError, which would cause
the OnigRegion to leak.
2024-02-02 10:39:42 -05:00
Peter Zhu 1c120efe02 Fix memory leak in stk_base when Regexp timeout
[Bug #20228]

If rb_reg_check_timeout raises a Regexp::TimeoutError, then the stk_base
will leak.
2024-02-02 10:39:42 -05:00
git 5b6167c252 * expand tabs. [ci skip]
Please consider using misc/expand_tabs.rb as a pre-commit hook.
2024-01-07 15:50:59 +00:00
Nobuyoshi Nakada c30b8ae947
Adjust styles and indents [ci skip] 2024-01-08 00:50:41 +09:00
Luke Gruber e12d4c654e Don't create T_MATCH object if /regexp/.match(string) doesn't match
Fixes [Bug #20104]
2024-01-01 13:28:26 -08:00
Peter Zhu f0efeddd41 Fix Regexp#inspect for GC compaction
rb_reg_desc was not safe for GC compaction because it took in the C
string and length but not the backing String object so it get moved
during compaction. This commit changes rb_reg_desc to use the string
from the Regexp object.

The test fails when RGENGC_CHECK_MODE is turned on:

    TestRegexp#test_inspect_under_gc_compact_stress [test/ruby/test_regexp.rb:474]:
    <"(?-mix:\\/)|"> expected but was
    <"/\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00/">.
2023-12-24 11:04:41 -05:00
Peter Zhu 42442ed789 Fix Regexp#match for GC compaction
The test fails when RGENGC_CHECK_MODE is turned on:

    TestRegexp#test_match_under_gc_compact_stress:
    NoMethodError: undefined method `match' for nil
        test_regexp.rb:878:in `block in test_match_under_gc_compact_stress'
2023-12-24 09:03:55 -05:00
Peter Zhu fadda88903 Fix Regexp#to_s for GC compaction
The test fails when RGENGC_CHECK_MODE is turned on:

    TestRegexp#test_to_s_under_gc_compact_stress = 13.46 s
    1) Failure:
    TestRegexp#test_to_s_under_gc_compact_stress [test/ruby/test_regexp.rb:81]:
    <"(?-mix:abcd\u3042)"> expected but was
    <"(?-mix:\u5C78\u3030\u5C78\u3030\u5C78\u3030\u5C78\u3030\u5C78\u3030)">.
2023-12-23 16:52:05 -05:00
Nobuyoshi Nakada dee45ac231
[DOC] State MatchData#[] when multiple captures with the same name 2023-12-19 13:48:51 +09:00
Victor Shepelev 570d7b2c3e
[DOC] Adjust some new features wording/examples. (#9183)
* Reword Range#overlap? docs last paragraph.

* Docs: add explanation about Queue#freeze

* Docs: Add :rescue event docs for TracePoint

* Docs: Enhance Module#set_temporary_name documentation

* Docs: Slightly expand Process::Status deprecations

* Fix MatchData#named_captures rendering glitch

* Improve Dir.fchdir examples

* Adjust Refinement#target docs
2023-12-14 23:01:48 +02:00
Dustin Brown d89280e8bf
Copy encoding flags when copying a regex [Bug #20039]
* 🐛 Fixes [Bug #20039](https://bugs.ruby-lang.org/issues/20039)

When a Regexp is initialized with another Regexp, we simply copy the
properties from the original. However, the flags on the original were
not being copied correctly. This caused an issue when the original had
multibyte characters and was being compared with an ASCII string.
Without the forced encoding flag (`KCODE_FIXED`) transferred on to the
new Regexp, the comparison would fail. See the included test for an
example.

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2023-12-06 19:25:29 -08:00
Nobuyoshi Nakada caa9881fde
[DOC] Fix doc/regexp.rdoc links
- Rename regexp.rdoc to exclude from "Pages".  This file is for to be
  included in the "class Regexp" document, but it also appeared as a
  separate page duplicately.
- Fix links on case-sensitive filesystems.
- Fix to use rdoc-ref instead of converted HTML page names.
2023-11-14 15:56:57 +09:00
Herwin 8b3d044004
[DOC] Indentation fix in comments of MatchData#inspect
The old version did not add syntax highlighting to the code block, and
included the "Related:" line in the code block as well.
2023-10-20 18:26:37 +09:00
Herwin 3467355450
[DOC] Fix typo in docs of Regexp#deconstruct_keys
of => if
2023-10-20 07:18:03 +09:00
Peter Zhu d42b9ffb20 Reuse Regexp ptr when recompiling
When matching an incompatible encoding, the Regexp needs to recompile.
If `usecnt == 0`, then we can reuse the `ptr` because nothing else is
using it. This avoids allocating another `regex_t`.

This speeds up matches that switch to incompatible encodings by 15%.

Branch:

```
Regex#match? with different encoding
                          1.431M (± 1.3%) i/s -      7.264M in   5.076153s
Regex#match? with same encoding
                         16.858M (± 1.1%) i/s -     85.347M in   5.063279s
```

Base:

```
Regex#match? with different encoding
                          1.248M (± 2.0%) i/s -      6.342M in   5.083151s
Regex#match? with same encoding
                         16.377M (± 1.1%) i/s -     82.519M in   5.039504s
```

Script:

```
regex = /foo/
str1 = "日本語"
str2 = "English".force_encoding("ASCII-8BIT")

Benchmark.ips do |x|
  x.report("Regex#match? with different encoding") do |times|
    i = 0
    while i < times
      regex.match?(str1)
      regex.match?(str2)
      i += 1
    end
  end

  x.report("Regex#match? with same encoding") do |times|
    i = 0
    while i < times
      regex.match?(str1)
      i += 1
    end
  end
end
```
2023-07-31 09:17:18 -04:00
Takashi Kokubun 9721972175 Resurrect rb_reg_prepare_re C API
Existing strscan releases rely on this C API. It means that the current
Ruby master doesn't work if your Gemfile.lock has strscan unless it's
locked to 3.0.7, which is not released yet.

To fix it, let's not remove the C API we've exposed to users.
2023-07-27 15:30:10 -07:00
Peter Zhu 69b20d1196 Don't load RREGEXP_PTR twice 2023-07-27 14:41:12 -04:00
Peter Zhu 511c51e116 Refactor err string in rb_reg_prepare_re 2023-07-27 14:04:02 -04:00
Peter Zhu 7193b404a1 Add function rb_reg_onig_match
rb_reg_onig_match performs preparation, error handling, and cleanup for
matching a regex against a string. This reduces repetitive code and
removes the need for StringScanner to access internal data of regex.
2023-07-27 13:33:40 -04:00
Kunshan Wang 639aa76e82
Embed struct rmatch into GC slot (#8097) 2023-07-20 14:17:38 -04:00
Nobuyoshi Nakada 913e01e80e
Stop allocating unused backref strings at `defined?` 2023-06-27 23:14:10 +09:00
Nobuyoshi Nakada df5ae0a550
Use `rb_reg_nth_defined` instead of `rb_match_nth_defined` 2023-06-27 22:39:15 +09:00
Burdette Lamar 932dd9f10e
[DOC] Regexp doc (#7923) 2023-06-20 09:28:21 -04:00
git d7300038e4 * expand tabs. [ci skip]
Please consider using misc/expand_tabs.rb as a pre-commit hook.
2023-06-09 12:45:58 +00:00
Nobuyoshi Nakada ab6eb3786c
Optimize `Regexp#dup` and `Regexp.new(/RE/)`
When copying from another regexp, copy already built `regex_t` instead
of re-compiling its source.
2023-06-09 20:22:30 +09:00
Jeremy Evans a8ba1ddd78 Use UTF-8 encoding for literal extended regexps with UTF-8 characters in comments
Fixes [Bug #19455]
2023-04-23 19:27:58 -07:00
Vladimir Dementyev b09f5c7bf7
MatchData#named_captures: add optional symbolize_names keyword (#6952) 2023-04-19 11:19:31 +12:00
Matt Valentine-House 026321c5b9 [Feature #19474] Refactor NEWOBJ macros
NEWOBJ_OF is now our canonical newobj macro. It takes an optional ec
2023-04-06 11:07:16 +01:00
Takashi Kokubun 233ddfac54 Stop exporting symbols for MJIT 2023-03-06 21:59:23 -08:00
Nobuyoshi Nakada a5310e609d [DOC] Fix options of `Regexp#initialize`
`Integer#|` is bit-wise OR operator, not logical OR.
2023-03-06 13:57:17 +09:00
Nobuyoshi Nakada 8ee604b9d4 `rb_scan_args` never fills optional arguments with `Qundef` 2023-03-06 13:57:17 +09:00
Nobuyoshi Nakada 680bd9027f [Bug #19471] `Regexp.compile` should handle keyword arguments
As well as `Regexp.new`, it should pass keyword arguments to the
`Regexp#initialize` method.
2023-03-03 15:27:37 +09:00
Jeremy Evans 04cfb26bd3 Remove support for the Regexp.new 3rd argument
This was deprecated in Ruby 3.2.

Fixes [Bug #18797]
2023-03-01 23:42:47 -08:00
Nobuyoshi Nakada ef00c6da88
Adjust `else` style to be consistent in each files [ci skip] 2023-02-26 13:20:43 +09:00
BurdetteLamar 3b239d2480 Remove (newly unneeded) remarks about aliases 2023-02-19 14:26:34 -08:00
Jean Boussier 46298955e4 Implement Write Barrier for RMatch objects
They only have two references.
2023-02-10 16:12:22 +01:00
OKURA Masafumi 11e0f62148
[DOC] Fix typo in document of regexp [ci skip] 2023-02-10 18:32:21 +09:00
Nobuyoshi Nakada b49cd84311 Remove `REG_LITERAL` flag
All `Regexp` literals are frozen now.
2023-02-09 19:21:24 +09:00
Jeremy Evans eccfc978fd Fix parsing of regexps that toggle extended mode on/off inside regexp
This was broken in ec3542229b. That commit
didn't handle cases where extended mode was turned on/off inside the
regexp.  There are two ways to turn extended mode on/off:

```
/(?-x:#y)#z
/x =~ '#y'

/(?-x)#y(?x)#z
/x =~ '#y'
```

These can be nested inside the same regexp:

```
/(?-x:(?x)#x
(?-x)#y)#z
/x =~ '#y'
```

As you can probably imagine, this makes handling these regexps
somewhat complex. Due to the nesting inside portions of regexps,
the unassign_nonascii function needs to be recursive.  In
recursive mode, it needs to track both opening and closing
parentheses, similar to how it already tracked opening and
closing brackets for character classes.

When scanning the regexp and coming to `(?` not followed by `#`,
scan for options, and use `x` and `i` to determine whether to
turn on or off extended mode.  For `:`, indicting only the
current regexp section should have the extended mode
switched, recurse with the extended mode set or unset. For `)`,
indicating the remainder of the regexp (or current regexp portion
if already recursing) should turn extended mode on or off, just
change the extended mode flag and keep scanning.

While testing this, I noticed that `a`, `d`, and `u` are accepted
as options, in addition to `i`, `m`, and `x`, but I can't see
where those options are documented.  I'm not sure whether or not
handling  `a`, `d`, and `u` as options is a bug.

Fixes [Bug #19379]
2023-01-30 08:51:12 -08:00
Burdette Lamar 30bd2a32fa
[DOC] Correction to RDoc for Regexp.new (#7130)
Correction to RDoc for Regexp.new
2023-01-16 11:02:23 -06:00
Jeremy Evans 7e8fa06022 Always issue deprecation warning when calling Regexp.new with 3rd positional argument
Previously, only certain values of the 3rd argument triggered a
deprecation warning.

First step for fix for bug #18797.  Support for the 3rd argument
will be removed after the release of Ruby 3.2.

Fix minor fallout discovered by the tests.

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2022-12-22 11:50:26 -08:00
Nobuyoshi Nakada e61e4ae60b
Refactor `reg_extract_args` to return regexp if given 2022-12-22 19:27:27 +09:00
Nobuyoshi Nakada 454c00723a Share argument parsing in `Regexp#initialize` and `Regexp.linear_time?` 2022-12-22 15:51:00 +09:00
卜部昌平 34d43ed9f5 typo in doc [ci skip] 2022-12-19 11:20:55 +09:00
卜部昌平 47a6e7b518 Note about Regexp.linera_time? [ci skip] 2022-12-19 11:05:55 +09:00
TSUYUSATO Kitsune fbedadb61f
Add `Regexp.linear_time?` (#6901) 2022-12-14 12:57:14 +09:00