Граф коммитов

635 Коммитов

Автор SHA1 Сообщение Дата
Dustin Brown d89280e8bf
Copy encoding flags when copying a regex [Bug #20039]
* 🐛 Fixes [Bug #20039](https://bugs.ruby-lang.org/issues/20039)

When a Regexp is initialized with another Regexp, we simply copy the
properties from the original. However, the flags on the original were
not being copied correctly. This caused an issue when the original had
multibyte characters and was being compared with an ASCII string.
Without the forced encoding flag (`KCODE_FIXED`) transferred on to the
new Regexp, the comparison would fail. See the included test for an
example.

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2023-12-06 19:25:29 -08:00
Nobuyoshi Nakada caa9881fde
[DOC] Fix doc/regexp.rdoc links
- Rename regexp.rdoc to exclude from "Pages".  This file is for to be
  included in the "class Regexp" document, but it also appeared as a
  separate page duplicately.
- Fix links on case-sensitive filesystems.
- Fix to use rdoc-ref instead of converted HTML page names.
2023-11-14 15:56:57 +09:00
Herwin 8b3d044004
[DOC] Indentation fix in comments of MatchData#inspect
The old version did not add syntax highlighting to the code block, and
included the "Related:" line in the code block as well.
2023-10-20 18:26:37 +09:00
Herwin 3467355450
[DOC] Fix typo in docs of Regexp#deconstruct_keys
of => if
2023-10-20 07:18:03 +09:00
Peter Zhu d42b9ffb20 Reuse Regexp ptr when recompiling
When matching an incompatible encoding, the Regexp needs to recompile.
If `usecnt == 0`, then we can reuse the `ptr` because nothing else is
using it. This avoids allocating another `regex_t`.

This speeds up matches that switch to incompatible encodings by 15%.

Branch:

```
Regex#match? with different encoding
                          1.431M (± 1.3%) i/s -      7.264M in   5.076153s
Regex#match? with same encoding
                         16.858M (± 1.1%) i/s -     85.347M in   5.063279s
```

Base:

```
Regex#match? with different encoding
                          1.248M (± 2.0%) i/s -      6.342M in   5.083151s
Regex#match? with same encoding
                         16.377M (± 1.1%) i/s -     82.519M in   5.039504s
```

Script:

```
regex = /foo/
str1 = "日本語"
str2 = "English".force_encoding("ASCII-8BIT")

Benchmark.ips do |x|
  x.report("Regex#match? with different encoding") do |times|
    i = 0
    while i < times
      regex.match?(str1)
      regex.match?(str2)
      i += 1
    end
  end

  x.report("Regex#match? with same encoding") do |times|
    i = 0
    while i < times
      regex.match?(str1)
      i += 1
    end
  end
end
```
2023-07-31 09:17:18 -04:00
Takashi Kokubun 9721972175 Resurrect rb_reg_prepare_re C API
Existing strscan releases rely on this C API. It means that the current
Ruby master doesn't work if your Gemfile.lock has strscan unless it's
locked to 3.0.7, which is not released yet.

To fix it, let's not remove the C API we've exposed to users.
2023-07-27 15:30:10 -07:00
Peter Zhu 69b20d1196 Don't load RREGEXP_PTR twice 2023-07-27 14:41:12 -04:00
Peter Zhu 511c51e116 Refactor err string in rb_reg_prepare_re 2023-07-27 14:04:02 -04:00
Peter Zhu 7193b404a1 Add function rb_reg_onig_match
rb_reg_onig_match performs preparation, error handling, and cleanup for
matching a regex against a string. This reduces repetitive code and
removes the need for StringScanner to access internal data of regex.
2023-07-27 13:33:40 -04:00
Kunshan Wang 639aa76e82
Embed struct rmatch into GC slot (#8097) 2023-07-20 14:17:38 -04:00
Nobuyoshi Nakada 913e01e80e
Stop allocating unused backref strings at `defined?` 2023-06-27 23:14:10 +09:00
Nobuyoshi Nakada df5ae0a550
Use `rb_reg_nth_defined` instead of `rb_match_nth_defined` 2023-06-27 22:39:15 +09:00
Burdette Lamar 932dd9f10e
[DOC] Regexp doc (#7923) 2023-06-20 09:28:21 -04:00
git d7300038e4 * expand tabs. [ci skip]
Please consider using misc/expand_tabs.rb as a pre-commit hook.
2023-06-09 12:45:58 +00:00
Nobuyoshi Nakada ab6eb3786c
Optimize `Regexp#dup` and `Regexp.new(/RE/)`
When copying from another regexp, copy already built `regex_t` instead
of re-compiling its source.
2023-06-09 20:22:30 +09:00
Jeremy Evans a8ba1ddd78 Use UTF-8 encoding for literal extended regexps with UTF-8 characters in comments
Fixes [Bug #19455]
2023-04-23 19:27:58 -07:00
Vladimir Dementyev b09f5c7bf7
MatchData#named_captures: add optional symbolize_names keyword (#6952) 2023-04-19 11:19:31 +12:00
Matt Valentine-House 026321c5b9 [Feature #19474] Refactor NEWOBJ macros
NEWOBJ_OF is now our canonical newobj macro. It takes an optional ec
2023-04-06 11:07:16 +01:00
Takashi Kokubun 233ddfac54 Stop exporting symbols for MJIT 2023-03-06 21:59:23 -08:00
Nobuyoshi Nakada a5310e609d [DOC] Fix options of `Regexp#initialize`
`Integer#|` is bit-wise OR operator, not logical OR.
2023-03-06 13:57:17 +09:00
Nobuyoshi Nakada 8ee604b9d4 `rb_scan_args` never fills optional arguments with `Qundef` 2023-03-06 13:57:17 +09:00
Nobuyoshi Nakada 680bd9027f [Bug #19471] `Regexp.compile` should handle keyword arguments
As well as `Regexp.new`, it should pass keyword arguments to the
`Regexp#initialize` method.
2023-03-03 15:27:37 +09:00
Jeremy Evans 04cfb26bd3 Remove support for the Regexp.new 3rd argument
This was deprecated in Ruby 3.2.

Fixes [Bug #18797]
2023-03-01 23:42:47 -08:00
Nobuyoshi Nakada ef00c6da88
Adjust `else` style to be consistent in each files [ci skip] 2023-02-26 13:20:43 +09:00
BurdetteLamar 3b239d2480 Remove (newly unneeded) remarks about aliases 2023-02-19 14:26:34 -08:00
Jean Boussier 46298955e4 Implement Write Barrier for RMatch objects
They only have two references.
2023-02-10 16:12:22 +01:00
OKURA Masafumi 11e0f62148
[DOC] Fix typo in document of regexp [ci skip] 2023-02-10 18:32:21 +09:00
Nobuyoshi Nakada b49cd84311 Remove `REG_LITERAL` flag
All `Regexp` literals are frozen now.
2023-02-09 19:21:24 +09:00
Jeremy Evans eccfc978fd Fix parsing of regexps that toggle extended mode on/off inside regexp
This was broken in ec3542229b. That commit
didn't handle cases where extended mode was turned on/off inside the
regexp.  There are two ways to turn extended mode on/off:

```
/(?-x:#y)#z
/x =~ '#y'

/(?-x)#y(?x)#z
/x =~ '#y'
```

These can be nested inside the same regexp:

```
/(?-x:(?x)#x
(?-x)#y)#z
/x =~ '#y'
```

As you can probably imagine, this makes handling these regexps
somewhat complex. Due to the nesting inside portions of regexps,
the unassign_nonascii function needs to be recursive.  In
recursive mode, it needs to track both opening and closing
parentheses, similar to how it already tracked opening and
closing brackets for character classes.

When scanning the regexp and coming to `(?` not followed by `#`,
scan for options, and use `x` and `i` to determine whether to
turn on or off extended mode.  For `:`, indicting only the
current regexp section should have the extended mode
switched, recurse with the extended mode set or unset. For `)`,
indicating the remainder of the regexp (or current regexp portion
if already recursing) should turn extended mode on or off, just
change the extended mode flag and keep scanning.

While testing this, I noticed that `a`, `d`, and `u` are accepted
as options, in addition to `i`, `m`, and `x`, but I can't see
where those options are documented.  I'm not sure whether or not
handling  `a`, `d`, and `u` as options is a bug.

Fixes [Bug #19379]
2023-01-30 08:51:12 -08:00
Burdette Lamar 30bd2a32fa
[DOC] Correction to RDoc for Regexp.new (#7130)
Correction to RDoc for Regexp.new
2023-01-16 11:02:23 -06:00
Jeremy Evans 7e8fa06022 Always issue deprecation warning when calling Regexp.new with 3rd positional argument
Previously, only certain values of the 3rd argument triggered a
deprecation warning.

First step for fix for bug #18797.  Support for the 3rd argument
will be removed after the release of Ruby 3.2.

Fix minor fallout discovered by the tests.

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2022-12-22 11:50:26 -08:00
Nobuyoshi Nakada e61e4ae60b
Refactor `reg_extract_args` to return regexp if given 2022-12-22 19:27:27 +09:00
Nobuyoshi Nakada 454c00723a Share argument parsing in `Regexp#initialize` and `Regexp.linear_time?` 2022-12-22 15:51:00 +09:00
卜部昌平 34d43ed9f5 typo in doc [ci skip] 2022-12-19 11:20:55 +09:00
卜部昌平 47a6e7b518 Note about Regexp.linera_time? [ci skip] 2022-12-19 11:05:55 +09:00
TSUYUSATO Kitsune fbedadb61f
Add `Regexp.linear_time?` (#6901) 2022-12-14 12:57:14 +09:00
S-H-GAMELINKS 1a64d45c67 Introduce encoding check macro 2022-12-02 01:31:27 +09:00
Yusuke Endoh ab4c7077cc Prevent segfault in String#scan with ObjectSpace.each_object
Calling `String#scan` without a block creates an incomplete MatchData
object whose `RMATCH(match)->str` is Qfalse. Usually this object is not
leaked, but it was possible to pull it by using ObjectSpace.each_object.

This change hides the internal MatchData object by using rb_obj_hide.

Fixes [Bug #19159]
2022-12-01 02:38:51 +09:00
S-H-GAMELINKS 1f4f6c9832 Using UNDEF_P macro 2022-11-16 18:58:33 +09:00
Nobuyoshi Nakada 001606097b Suppress false warning by a bug of gcc
GCC [Bug 99578] seems triggered by calling `rb_reg_last_match` before
`match_check(match)`, probably by `NIL_P(match)` in `rb_reg_nth_match`.

[Bug 99578]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578
2022-11-08 16:13:30 +09:00
Yusuke Endoh 67ed70da61 Refactor timeout-setting code to a function 2022-10-24 18:21:30 +09:00
Yusuke Endoh ef01482f64 Refactor timeout-related code in re.c a little 2022-10-24 18:13:26 +09:00
Yusuke Endoh b51b22513f
Fix per-instance Regexp timeout (#6621)
Fix per-instance Regexp timeout

This makes it follow what was decided in [Bug #19055]:

* `Regexp.new(str, timeout: nil)` should respect the global timeout
* `Regexp.new(str, timeout: huge_val)` should use the maximum value that
  can be represented in the internal representation
* `Regexp.new(str, timeout: 0 or negative value)` should raise an error
2022-10-24 18:03:26 +09:00
S-H-GAMELINKS c4089e6524 Fix argument & Remove enum 2022-10-23 17:38:59 +09:00
S-H-GAMELINKS 1e06ef1328 Introduce rb_memsearch_with_char_size function 2022-10-23 17:38:59 +09:00
git 2dd1a037de * expand tabs. [ci skip]
Tabs were expanded because the file did not have any tab indentation in unedited lines.
Please update your editor config, and use misc/expand_tabs.rb in the pre-commit hook.
2022-10-10 13:22:15 +09:00
Nobuyoshi Nakada 0a98dd1cff
Should use dedecated function `Check_Type` 2022-10-10 13:21:57 +09:00
Vladimir Dementyev 4954c9fc0f Add MatchData#deconstruct/deconstruct_keys 2022-10-10 12:41:13 +09:00
Nobuyoshi Nakada c53667691a
[DOC] `offset` argument of Regexp#match 2022-08-18 23:25:05 +09:00
Aaron Patterson e4e054e3ce Speed up setting the backref match object
This patch speeds up setting the backref match object by avoiding some
memcopies.  Take the following code for example:

```ruby
"hello world" =~ /hello/
p $~
```

When the RE matches the string, we have to set the Match object in the
backref global.  So we would allocate a match object[^1] and use
`rb_reg_region_copy`[^2] to make a deep copy of the stack allocated
`re_registers` struct[^3] in to the newly created Ruby object.  This
could possibly trigger GC[^4], and would allocate new memory.

This patch makes a shallow copy of the `re_registers` struct on to the
Match object allowing the match object to manage the `re_registers`
pointer and also avoiding some calls to `xmalloc` and some manual
memcopy.

Benchmark looks like this:

```ruby

require "benchmark/ips"

def test_re thing
  thing =~ /hello/
end

Benchmark.ips do |x|
  x.report("re hit") do
    test_re "hello world"
  end

  x.report("re miss") do
    test_re "world"
  end
end
```

Before this patch:

```
$ ruby -v test.rb
ruby 3.2.0dev (2022-07-27T22:29:00Z master 4ad69899b7) [arm64-darwin21]
Ignoring bcrypt-3.1.16 because its extensions are not built. Try: gem pristine bcrypt --version 3.1.16
Warming up --------------------------------------
              re hit   345.401k i/100ms
             re miss   673.584k i/100ms
Calculating -------------------------------------
              re hit      3.452M (± 0.5%) i/s -     17.270M in   5.002535s
             re miss      6.736M (± 0.4%) i/s -     34.353M in   5.099593s
```

After this patch:

```
$ ./ruby -v test.rb
ruby 3.2.0dev (2022-08-01T21:24:12Z less-memcpy 0ff2a56606) [arm64-darwin21]
Warming up --------------------------------------
              re hit   419.578k i/100ms
             re miss   673.251k i/100ms
Calculating -------------------------------------
              re hit      4.201M (± 0.7%) i/s -     21.398M in   5.093593s
             re miss      6.716M (± 0.4%) i/s -     33.663M in   5.012756s
```

Matches get faster and misses maintain the same speed

[^1]: 24204d54ab/re.c (L1737)
[^2]: 24204d54ab/re.c (L1738)
[^3]: 24204d54ab/re.c (L1686)
[^4]: 24204d54ab/re.c (L981)
2022-08-02 09:04:04 -07:00