Граф коммитов

151 Коммитов

Автор SHA1 Сообщение Дата
NAITOH Jun e73f35ddaf [ruby/strscan] [CRuby] Optimize `strscan_do_scan()`: Remove
unnecessary use of `rb_enc_get()`
(https://github.com/ruby/strscan/pull/108)

- before: #106

## Why?

In `rb_strseq_index()`, the result of `rb_enc_check()` is used.

-
6c7209cd37/string.c (L4335-L4368)
> enc = rb_enc_check(str, sub);

> return strseq_core(str_ptr, str_ptr_end, str_len, sub_ptr, sub_len,
offset, enc);

-
6c7209cd37/string.c (L4309-L4318)
```C
strseq_core(const char *str_ptr, const char *str_ptr_end, long str_len,
            const char *sub_ptr, long sub_len, long offset, rb_encoding *enc)
{
    const char *search_start = str_ptr;
    long pos, search_len = str_len - offset;

    for (;;) {
        const char *t;
        pos = rb_memsearch(sub_ptr, sub_len, search_start, search_len, enc);
```

## Benchmark

It shows String as a pattern is 1.24x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     9.225M i/s -      9.328M times in 1.011068s (108.40ns/i)
          regexp_var     9.327M i/s -      9.413M times in 1.009214s (107.21ns/i)
              string     9.200M i/s -      9.355M times in 1.016840s (108.70ns/i)
          string_var    11.249M i/s -     11.255M times in 1.000578s (88.90ns/i)
Calculating -------------------------------------
              regexp     9.565M i/s -     27.676M times in 2.893476s (104.55ns/i)
          regexp_var    10.111M i/s -     27.982M times in 2.767496s (98.90ns/i)
              string    10.060M i/s -     27.600M times in 2.743465s (99.40ns/i)
          string_var    12.519M i/s -     33.746M times in 2.695615s (79.88ns/i)

Comparison:
          string_var:  12518707.2 i/s
          regexp_var:  10111089.6 i/s - 1.24x  slower
              string:  10060144.4 i/s - 1.24x  slower
              regexp:   9565124.4 i/s - 1.31x  slower
```

https://github.com/ruby/strscan/commit/ff2d7afa19
2024-10-26 18:44:15 +09:00
Nobuyoshi Nakada d6046bccb7 [ruby/strscan] Use C90 as far as supporting 2.6 or earlier
(https://github.com/ruby/strscan/pull/101)

https://github.com/ruby/strscan/commit/d31274f41b
2024-10-26 18:44:15 +09:00
NAITOH Jun d81b0588bb
[ruby/strscan] Accept String as a pattern at non head
(https://github.com/ruby/strscan/pull/106)

It supports non-head match cases such as StringScanner#scan_until.

If we use a String as a pattern, we can improve match performance.
Here is a result of the including benchmark.

## CRuby

It shows String as a pattern is 1.18x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     9.403M i/s -      9.548M times in 1.015459s (106.35ns/i)
          regexp_var     9.162M i/s -      9.248M times in 1.009479s (109.15ns/i)
              string     8.966M i/s -      9.274M times in 1.034343s (111.54ns/i)
          string_var    11.051M i/s -     11.190M times in 1.012538s (90.49ns/i)
Calculating -------------------------------------
              regexp    10.319M i/s -     28.209M times in 2.733707s (96.91ns/i)
          regexp_var    10.032M i/s -     27.485M times in 2.739807s (99.68ns/i)
              string     9.681M i/s -     26.897M times in 2.778397s (103.30ns/i)
          string_var    12.162M i/s -     33.154M times in 2.726046s (82.22ns/i)

Comparison:
          string_var:  12161920.6 i/s
              regexp:  10318949.7 i/s - 1.18x  slower
          regexp_var:  10031617.6 i/s - 1.21x  slower
              string:   9680843.7 i/s - 1.26x  slower
```

## JRuby

It shows String as a pattern is 2.11x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.591M i/s -      7.544M times in 0.993780s (131.74ns/i)
          regexp_var     6.143M i/s -      6.125M times in 0.997038s (162.77ns/i)
              string    14.135M i/s -     14.079M times in 0.996067s (70.75ns/i)
          string_var    14.079M i/s -     14.057M times in 0.998420s (71.03ns/i)
Calculating -------------------------------------
              regexp     9.409M i/s -     22.773M times in 2.420268s (106.28ns/i)
          regexp_var    10.116M i/s -     18.430M times in 1.821820s (98.85ns/i)
              string    21.389M i/s -     42.404M times in 1.982519s (46.75ns/i)
          string_var    20.897M i/s -     42.237M times in 2.021187s (47.85ns/i)

Comparison:
              string:  21389191.1 i/s
          string_var:  20897327.5 i/s - 1.02x  slower
          regexp_var:  10116464.7 i/s - 2.11x  slower
              regexp:   9409222.3 i/s - 2.27x  slower
```

See:
be7815ec02/core/src/main/java/org/jruby/util/StringSupport.java (L1706-L1736)

---------

https://github.com/ruby/strscan/commit/f9d96c446a

Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-09-17 15:12:25 +09:00
Hiroshi SHIBATA 32f134bb85
Added pre-release suffix for development version of default gems
https://github.com/ruby/stringio/issues/81
2024-08-31 14:22:17 +09:00
Hiroshi SHIBATA 3eda59e975
Sync strscan HEAD again.
https://github.com/ruby/strscan/pull/99 split document with multi-byte
chars.
2024-06-04 12:40:08 +09:00
Hiroshi SHIBATA 78bfde5d9f
Revert "[ruby/strscan] Doc for StringScanner"
This reverts commit 974ed1408c.
2024-05-30 21:13:10 +09:00
Hiroshi SHIBATA d70b0da482
Revert "Fix reference path for strscan documentation"
This reverts commit 1fa93fb948.
2024-05-30 21:13:01 +09:00
Hiroshi SHIBATA 1fa93fb948
Fix reference path for strscan documentation 2024-05-30 14:29:25 +09:00
Burdette Lamar 974ed1408c
[ruby/strscan] Doc for StringScanner
(https://github.com/ruby/strscan/pull/96)

#peek_byte and #scan_byte not updated (not available in my repo --
sorry).

---------

https://github.com/ruby/strscan/commit/0123da7352

Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
2024-05-30 12:34:18 +09:00
卜部昌平 c844968b72 ruby tool/update-deps --fix 2024-04-27 21:55:28 +09:00
Aaron Patterson 164e464b04 [ruby/strscan] Add a method for peeking and reading bytes as
integers
(https://github.com/ruby/strscan/pull/89)

This commit adds `scan_byte` and `peek_byte`. `scan_byte` will scan the
current byte, return it as an integer, and advance the cursor.
`peek_byte` will return the current byte as an integer without advancing
the cursor.

Currently `StringScanner#get_byte` returns a string, but I want to get
the current byte without allocating a string. I think this will help
with writing high performance lexers.

---------

https://github.com/ruby/strscan/commit/873aba2e5d

Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-02-26 15:54:54 +09:00
Sutou Kouhei ce2618c628
[ruby/strscan] Bump version
https://github.com/ruby/strscan/commit/ba338b882c
2024-02-08 14:43:56 +09:00
Sutou Kouhei 5afae77ce9
[ruby/strscan] Bump version
https://github.com/ruby/strscan/commit/842845af1f
2024-02-08 14:43:56 +09:00
Sutou Kouhei ac636f5709
[ruby/strscan] Bump version
https://github.com/ruby/strscan/commit/d6f97ec102
2024-01-19 10:49:12 +09:00
NAITOH Jun 338eb0065b [ruby/strscan] StringScanner#captures: Return nil not "" for
unmached capture
(https://github.com/ruby/strscan/pull/72)

fix https://github.com/ruby/strscan/issues/70
If there is no substring matching the group (s[3]), the behavior is
different.

If there is no substring matching the group, the corresponding element
(s[3]) should be nil.

```
s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba...">
s.scan /(foo)(bar)(BAZ)?/  #=> "foobar"
s[0]           #=> "foobar"
s[1]           #=> "foo"
s[2]           #=> "bar"
s[3]           #=> nil
s.captures #=> ["foo", "bar", ""]
s.captures.compact #=> ["foo", "bar", ""]
```

```
s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba...">
s.scan /(foo)(bar)(BAZ)?/  #=> "foobar"
s[0]           #=> "foobar"
s[1]           #=> "foo"
s[2]           #=> "bar"
s[3]           #=> nil
s.captures #=> ["foo", "bar", nil]
s.captures.compact #=> ["foo", "bar"]
```

https://docs.ruby-lang.org/ja/latest/method/MatchData/i/captures.html
```
/(foo)(bar)(BAZ)?/ =~ "foobarbaz" #=> 0
$~.to_a        #=> ["foobar", "foo", "bar", nil]
$~.captures #=> ["foo", "bar", nil]
$~.captures.compact #=> ["foo", "bar"]
```

* StringScanner#captures is not yet documented.
https://docs.ruby-lang.org/ja/latest/class/StringScanner.html

https://github.com/ruby/strscan/commit/1fbfdd3c6f
2024-01-14 22:27:24 +09:00
Hiroshi SHIBATA f54369830f Revert "Rollback to released version numbers of stringio and strscan"
This reverts commit 6a79e53823.
2023-12-25 21:12:49 +09:00
Hiroshi SHIBATA 6a79e53823
Rollback to released version numbers of stringio and strscan 2023-12-16 12:00:59 +08:00
Sutou Kouhei ce8301084f [ruby/strscan] Bump version
https://github.com/ruby/strscan/commit/1b3393be05
2023-11-08 09:26:58 +09:00
Peter Zhu 91e13a5207 [ruby/strscan] Fix indentation in strscan.c
[ci skip]
2023-07-28 10:12:52 -04:00
Peter Zhu 7193b404a1 Add function rb_reg_onig_match
rb_reg_onig_match performs preparation, error handling, and cleanup for
matching a regex against a string. This reduces repetitive code and
removes the need for StringScanner to access internal data of regex.
2023-07-27 13:33:40 -04:00
Peter Zhu e27eab2f85 [ruby/strscan] Sync missed commit
Syncs commit ruby/strscan@76b377a5d8.
2023-07-27 09:42:42 -04:00
Matt Valentine-House 5e4b80177e Update the depend files 2023-02-28 09:09:00 -08:00
Matt Valentine-House f38c6552f9 Remove intern/gc.h from Make deps 2023-02-27 10:11:56 -08:00
Sutou Kouhei 18e840ac60 [ruby/strscan] Bump version
https://github.com/ruby/strscan/commit/681cde0f27
2023-02-21 19:31:36 +09:00
OKURA Masafumi a44f5ab089 [ruby/strscan] Mention return value of `rest?` in the doc
(https://github.com/ruby/strscan/pull/49)

The doc of `rest?` was unclear about return value. This commit adds the
return value to the doc.
2023-02-21 19:31:35 +09:00
Nobuyoshi Nakada 899ea35035
Extract include/ruby/internal/attr/packed_struct.h
Split `PACKED_STRUCT` and `PACKED_STRUCT_UNALIGNED` macros into the
macros bellow:
* `RBIMPL_ATTR_PACKED_STRUCT_BEGIN`
* `RBIMPL_ATTR_PACKED_STRUCT_END`
* `RBIMPL_ATTR_PACKED_STRUCT_UNALIGNED_BEGIN`
* `RBIMPL_ATTR_PACKED_STRUCT_UNALIGNED_END`
2023-02-08 12:34:13 +09:00
Sutou Kouhei 79ad045214 [ruby/strscan] Bump version
https://github.com/ruby/strscan/commit/3ada12613d
2022-12-26 15:09:21 +09:00
Hiroshi SHIBATA 4e31fea77d Merge strscan-3.0.5 2022-12-09 16:36:22 +09:00
Peter Zhu 2d5ecd60a5 [Feature #18249] Update dependencies 2022-02-22 09:55:21 -05:00
Nobuyoshi Nakada ac152b3cac
Update dependencies 2021-11-21 16:21:18 +09:00
Sutou Kouhei c0c43276a1 [ruby/strscan] Bump version
If we use the same version as the default strscan gem in Ruby, "gem
install" doesn't extract .gem. It fails "gem install" because "gem
install" can't find ext/strscan/ to be built.

https://github.com/ruby/strscan/commit/3ceafa6cdc
2021-10-24 05:57:48 +09:00
卜部昌平 5c167a9778 ruby tool/update-deps --fix 2021-10-05 14:18:23 +09:00
Gannon McGibbon a42b7de436 [ruby/strscan] Replace "iff" with "if and only if" (#18)
iff means if and only if, but readers without that knowledge might
assume this to be a spelling mistake. To me, this seems like
exclusionary language that is unnecessary. Simply using "if and only if"
instead should suffice.

https://github.com/ruby/strscan/commit/066451c11e
2021-05-06 16:21:14 +09:00
Kenichi Kamiya 564ccd095a [ruby/strscan] Fix segmentation fault of `StringScanner#charpos` when `String#byteslice` returns non string value [Bug #17756] (#20)
https://github.com/ruby/strscan/commit/92961cde2b
2021-05-06 16:20:38 +09:00
Hiroshi SHIBATA 822eb94563
Import from https://github.com/ruby/strscan/pull/19
* Use Gemfile instead of Gem::Specification#add_development_dependency.

* Use pend instead of skip for test-unit.
2021-05-06 16:18:58 +09:00
卜部昌平 6413dc27dc dependency updates 2021-04-13 14:30:21 +09:00
Jeremy Evans c03b723f56 Update class documentation for StringScanner
The [] wasn't being displayed, and try to fix formatting for bol?
and << (even if they aren't linked).

Fixes [Bug #17620]
2021-02-10 08:17:07 -08:00
Kenta Murata b5de66e132
[strscan] Fix license comment and files
https://github.com/ruby/strscan/commit/a999f2c6d1
2020-12-18 14:25:48 +09:00
Kenta Murata 5370963992
[strscan] Version 3.0.0
https://github.com/ruby/strscan/commit/08645e4e77
2020-12-18 14:25:42 +09:00
Kenta Murata 985f0af257
[strscan] Make strscan Ractor safe (#17)
* Make strscan Ractor safe

* Add test-unit in the development dependencies

https://github.com/ruby/strscan/commit/3c93c2bebe
2020-12-18 14:25:41 +09:00
Aaron Patterson 6aa466ba9c mark regex internal to string scanner 2020-10-02 12:01:57 -07:00
Jeremy Evans d9b8411a7b Document that StringScanner#matched_size returns size in bytes [ci skip]
Fixes [Bug #17139]
2020-09-02 10:41:49 -07:00
Sutou Kouhei c23c880f56
[ruby/strscan] Bump version
https://github.com/ruby/strscan/commit/df90d541fa
2020-08-31 21:57:35 +09:00
Nobuyoshi Nakada c76508b88c
[ruby/strscan] Replaced examples using $KCODE with encodings
`$KCODE` has been deprecated and not effective since years ago.

https://github.com/ruby/strscan/commit/7c4dbd4cb3
2020-08-31 21:57:35 +09:00
卜部昌平 490010084e sed -i '/rmodule.h/d' 2020-08-27 16:42:06 +09:00
卜部昌平 756403d775 sed -i '/r_cast.h/d' 2020-08-27 15:03:36 +09:00
卜部昌平 0da2a3f1fc sed -i '\,2/extern.h,d' 2020-08-27 14:07:49 +09:00
Hiroshi SHIBATA 8fb02b7a97
Update the license for the default gems to dual licenses 2020-08-18 20:26:39 +09:00
卜部昌平 9e41a75255 sed -i 's|ruby/impl|ruby/internal|'
To fix build failures.
2020-05-11 09:24:08 +09:00
卜部昌平 d7f4d732c1 sed -i s|ruby/3|ruby/impl|g
This shall fix compile errors.
2020-05-11 09:24:08 +09:00