Add "Optimization" section to regexp.rdoc (#8849)

* Add "Optimization" section to regexp.rdoc

* Apply the suggestions by @BurdetteLamar

---------

Co-authored-by: Burdette Lamar <BurdetteLamar@Yahoo.com>
This commit is contained in:
Hiroya Fujinami 2023-11-10 01:24:15 +09:00 коммит произвёл GitHub
Родитель ad3db6711c
Коммит c49adfab5d
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
1 изменённых файлов: 27 добавлений и 0 удалений

Просмотреть файл

@ -1228,6 +1228,33 @@ when regexp.timeout is non-+nil+, that value controls timing out:
| nil | Float | Times out in Float seconds. |
| Float | Any | Times out in Float seconds. |
== Optimization
For certain values of the pattern and target string,
matching time can grow polynomially or exponentially in relation to the input size;
the potential vulnerability arising from this is the {regular expression denial-of-service}[https://en.wikipedia.org/wiki/ReDoS] (ReDoS) attack.
\Regexp matching can apply an optimization to prevent ReDoS attacks.
When the optimization is applied, matching time increases linearly (not polynomially or exponentially)
in relation to the input size, and a ReDoS attach is not possible.
This optimization is applied if the pattern meets these criteria:
- No backreferences.
- No subexpression calls.
- No nested lookaround anchors or atomic groups.
- No nested quantifiers with counting (i.e. no nested <tt>{n}</tt>,
<tt>{min,}</tt>, <tt>{,max}</tt>, or <tt>{min,max}</tt> style quantifiers)
You can use method Regexp.linear_time? to determine whether a pattern meets these criteria:
Regexp.linear_time?(/a*/) # => true
Regexp.linear_time?('a*') # => true
Regexp.linear_time?(/(a*)\1/) # => false
However, an untrusted source may not be safe even if the method returns +true+,
because the optimization uses memoization (which may invoke large memory consumption).
== References
Read (online PDF books):