Restore `rescue`-context from the outer context.
`retry` targets the next outer block except for between `rescue` and
`else` or `ensure`, otherwise, if there is no enclosing block, it
should be syntax error.
All kind of AST nodes use same struct RNode, which has u1, u2, u3 union members
for holding different kind of data.
This has two problems.
1. Low flexibility of data structure
Some nodes, for example NODE_TRUE, don’t use u1, u2, u3. On the other hand,
NODE_OP_ASGN2 needs more than three union members. However they use same
structure definition, need to allocate three union members for NODE_TRUE and
need to separate NODE_OP_ASGN2 into another node.
This change removes the restriction so make it possible to
change data structure by each node type.
2. No compile time check for union member access
It’s developer’s responsibility for using correct member for each node type when it’s union.
This change clarifies which node has which type of fields and enables compile time check.
This commit also changes node_buffer_elem_struct buf management to handle
different size data with alignment.
NODE_ARGS, NODE_ARYPTN, NODE_FNDPTN manage memory of their
structure by imemo tmpbuf Object.
However rb_ast_struct has reference to NODE. Then these
memory can be freed directly when rb_ast_struct is freed.
This commit reduces parser's dependency on CRuby functions.
[Bug #19836]
The parser does not free the chain of `struct vtable`, which causes
memory leaks.
The following script reproduces this issue:
```
10.times do
100_000.times do
Ripper.parse("-> {")
end
puts `ps -o rss= -p #{$$}`
end
```
[Bug #19835]
The parser does not free the `tbl` of the `struct vtable` when there are
leftover `lvtbl` in the parser. This causes a memory leak.
The following script reproduces this issue:
```
10.times do
100_000.times do
Ripper.parse("class Foo")
end
puts `ps -o rss= -p #{$$}`
end
```
```ruby
def helper_cant_rescue
begin
raise SyntaxError
rescue
cant_rescue # here
end
end
```
on this case, a line event is reported on `cant_rescue` line
because of node structure. it should not be reported.
This fixes an infinite loop possible after ec3542229b.
For \u{} escapes in regexps, skip validation in the parser, and rely on the regexp
code to handle validation. This is necessary so that invalid unicode escapes in
comments in extended regexps are allowed.
Fixes [Bug #19750]
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
According to the C99 specification section 7.20.3.2 paragraph 2:
> If ptr is a null pointer, no action occurs.
So we do not need to check that the pointer is a null pointer.
`yyparse` never changes the value of `coverage_enabled`.
`coverage_enabled` depends on only return value of `e_option_supplied`.
Therefore `parser_params` doesn't need to have `coverage_enabled.
Introduce Universal Parser mode for the parser.
This commit includes these changes:
* Introduce `UNIVERSAL_PARSER` macro. All of CRuby related functions
are passed via `struct rb_parser_config_struct` when this macro is enabled.
* Add CI task with 'cppflags=-DUNIVERSAL_PARSER' for ubuntu.
98637d421d changes the name of
the function. However this function is exported as global,
then change the name to origin one for keeping compatibility.
After 6c0925ba70, it was impossible
to distinguish between the presence or absence of `*`.
# Before the commit
Ripper.sexp('0 in []')[1][0][2][1] #=> [:aryptn, nil, nil, nil, nil]
Ripper.sexp('0 in [*]')[1][0][2][1] #=> [:aryptn, nil, nil, [:var_field, nil], nil]
# After the commit
Ripper.sexp('0 in []')[1][0][2][1] #=> [:aryptn, nil, nil, nil, nil]
Ripper.sexp('0 in [*]')[1][0][2][1] #=> [:aryptn, nil, nil, nil, nil]
This commit reverts it.
`gc_verify_internal_consistency` reports "found internal inconsistency"
for "test_pattern_matching.rb".
http://ci.rvm.jp/results/trunk-gc-asserts@ruby-sp2-docker/4501173
Ruby's parser manages objects by two different ways.
1. For parser
* markable node holds objects
* call `RB_OBJ_WRITTEN` with `p->ast` as parent
* `mark_ast_value` marks objects
2. For ripper
* unmarkable node, NODE_RIPPER/NODE_CDECL, holds objects
* call `rb_ast_add_mark_object`. This function calls `rb_hash_aset` then
`RB_OBJ_WRITTEN` is called with `mark_hash` as parent
* `mark_hash` marks objects
However in current pattern_matching implementation
* markable node holds objects
* call `rb_ast_add_mark_object`
This commit fix it to be #2.
This was inconsistency however always `mark_hash` is
made young by `rb_ast_add_mark_object` call then objects
are not collected.
This was introduced by b609bdeb53
to suppress warnings. However these warngins were deleted by
beae6cbf0f. Therefore these codes
are not needed anymore.
If the rescue clause has only exc_var and not exc_list, use the
exc_var position instead of the rescue body position.
This issue appears to have been introduced in
688169fd83 when "opt_list" was split
into "exc_list exc_var".
Fixes [Bug #18974]
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.
This commit adds these methods.
* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.
[Feature #19070]
Impacts on memory usage and performance are below:
Memory usage:
```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb
# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb
# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```
Performance:
```
$ cat ../ast_keep_tokens.yml
prelude: |
src = <<~SRC
module M
class C
def m1(a, b)
1 + a + b
end
end
end
SRC
benchmark:
without_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
with_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)
$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..
| |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens | 21.659k| 21.303k|
| | 1.02x| -|
|with_keep_tokens | 6.220k| 5.691k|
| | 1.09x| -|
```
Assign internal_id to semantic value so that dump parsetree option
can render the tree for these codes without SEGV.
* `def m(&); end`
* `def m(*); end`
* `def m(**); end`
By this change, syntax error is recovered smaller units.
In the case below, "DEFN :bar" is same level with "CLASS :Foo"
now.
```
module Z
class Foo
foo.
end
def bar
end
end
```
[Feature #19013]
"end" after "." or "::" is treated as local variable or method,
see `EXPR_DOT_bit` for detail.
However this "changes" where `bar` method is defined. In the example
below it is not module Z but class Foo.
```
module Z
class Foo
foo.
end
def bar
end
end
```
[Feature #19013]
Invalid escapes are handled at multiple levels. The first level
is in parse.y, so skip invalid unicode escape checks for regexps
in parse.y.
Make rb_reg_preprocess and unescape_nonascii accept the regexp
options. In unescape_nonascii, if the regexp is an extended
regexp, when "#" is encountered, ignore all characters until the
end of line or end of regexp.
Unfortunately, in extended regexps, you can use "#" as a non-comment
character inside a character class, so also parse "[" and "]"
specially for extended regexps, and only skip comments if "#" is
not inside a character class. Handle nested character classes as well.
This issue doesn't just affect extended regexps, it also affects
"(#?" comments inside all regexps. So for those comments, scan
until trailing ")" and ignore content inside.
I'm not sure if there are other corner cases not handled. A
better fix would be to redesign the regexp parser so that it
unescaped during parsing instead of before parsing, so you already
know the current parsing state.
Fixes [Bug #18294]
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
This allows for the following syntax:
```ruby
def foo(*)
bar(*)
end
def baz(**)
quux(**)
end
```
This is a natural addition after the introduction of anonymous
block forwarding. Anonymous rest and keyword rest arguments were
already supported in method parameters, this just allows them to
be used as arguments to other methods. The same advantages of
anonymous block forwarding apply to rest and keyword rest argument
forwarding.
This has some minor changes to #parameters output. Now, instead
of `[:rest], [:keyrest]`, you get `[:rest, :*], [:keyrest, :**]`.
These were already used for `...` forwarding, so I think it makes
it more consistent to include them in other cases. If we want to
use `[:rest], [:keyrest]` in both cases, that is also possible.
I don't think the previous behavior of `[:rest], [:keyrest]` in
the non-... case and `[:rest, :*], [:keyrest, :**]` in the ...
case makes sense, but if we did want that behavior, we'll have to
make more substantial changes, such as using a different ID in the
... forwarding case.
Implements [Feature #18351]
This `NODE` type was used in pre-YARV implementation, to improve
the performance of assignment to dynamic local variable defined at
the innermost scope. It has no longer any actual difference with
`NODE_DASGN`, except for the node dump.
Dumped iseq binary can not have unnamed symbols/IDs, and ID 0 is
stored instead. As `struct rb_id_table` disallows ID 0, also for
the distinction, re-assign a new temporary ID based on the local
variable table index when loading from the binary, as well as the
parser.
The implementation of a local variable tables was represented as `ID*`,
but it was very hacky: the first element is not an ID but the size of
the table, and, the last element is (sometimes) a link to the next local
table only when the id tables are a linked list.
This change converts the hacky implementation to a normal struct.
block to another method without having to provide a name for the
block parameter.
Implements [Feature #11256]
Co-authored-by: Yusuke Endoh mame@ruby-lang.org
Co-authored-by: Nobuyoshi Nakada nobu@ruby-lang.org
`RubyVM.keep_script_lines` enables to keep script lines
for each ISeq and AST. This feature is for debugger/REPL
support.
```ruby
RubyVM.keep_script_lines = true
RubyVM::keep_script_lines = true
eval("def foo = nil\ndef bar = nil")
pp RubyVM::InstructionSequence.of(method(:foo)).script_lines
```
When Bison reports "memory exhausted", it means the parser stack
depth reached the limit `YYMAXDEPTH` which is defaulted to 10_000,
but not memory allocation failed.
... as per ko1's preference. He is preparing to extend this feature to
ISeq for his new debugger. He prefers "keep" to "save" for this wording.
This API is internal and not included in any released version, so I
change it in advance.
This fixes https://bugs.ruby-lang.org/issues/18038. The provided
reproduction showed that this happens in heredocs with double
interpolation. In this case `DSTR` was getting returned but needs to be
convered to a `EVSTR` which is what is returned by the function. There
may be an additional bug here that we weren't able to produce. It seems
odd that `STR` returns `DSTR` while everything else should return
`EVSTR` since the function is `new_evstr`.
[Bug #18038][ruby-core:104597]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Pin matching for local variables and constants is already supported,
and it is fairly simple to add support for these variable types.
Note that pin matching for method calls is still not supported
without wrapping in parentheses (pin expressions). I think that's
for the best as method calls are far more complex (arguments/blocks).
Implements [Feature #17724]
by merging `rb_ast_body_t#line_count` and `#script_lines`.
Fortunately `line_count == RARRAY_LEN(script_lines)` was always
satisfied. When script_lines is saved, it has an array of lines, and
when not saved, it has a Fixnum that represents the old line_count.
This option makes the parser keep the original source as an array of
the original code lines. This feature exploits the mechanism of
`SCRIPT_LINES__` but records only the specified code that is passed to
RubyVM::AST.of or .parse, instead of recording all parsed program texts.
Ruby uses a recursive algorithm for handling control/meta escapes
in strings (read_escape). However, the equivalent code for regexps
(tokadd_escape) in did not use a recursive algorithm. Due to this,
Handling of control/meta escapes in regexp did not have the same
behavior as in strings, leading to behavior such as the following
returning nil:
```ruby
/\c\xFF/ =~ "\c\xFF"
```
Switch the code for handling \c, \C and \M in literal regexps to
use the same code as for strings (read_escape), to keep behavior
consistent between the two.
Fixes [Bug #14367]
This change allows `def hello = puts "Hello"` without parentheses.
Note that `private def hello = puts "Hello"` does not parse for
technical reason.
[Feature #17398]
Although it was used just to suppress an "unsed argument" warning
in the same manner as other bison-provided functions, it has been
dropped since Bision 3.7.5. And we always suppress that
warnings.