After 6c0925ba70, it was impossible
to distinguish between the presence or absence of `*`.
# Before the commit
Ripper.sexp('0 in []')[1][0][2][1] #=> [:aryptn, nil, nil, nil, nil]
Ripper.sexp('0 in [*]')[1][0][2][1] #=> [:aryptn, nil, nil, [:var_field, nil], nil]
# After the commit
Ripper.sexp('0 in []')[1][0][2][1] #=> [:aryptn, nil, nil, nil, nil]
Ripper.sexp('0 in [*]')[1][0][2][1] #=> [:aryptn, nil, nil, nil, nil]
This commit reverts it.
`gc_verify_internal_consistency` reports "found internal inconsistency"
for "test_pattern_matching.rb".
http://ci.rvm.jp/results/trunk-gc-asserts@ruby-sp2-docker/4501173
Ruby's parser manages objects by two different ways.
1. For parser
* markable node holds objects
* call `RB_OBJ_WRITTEN` with `p->ast` as parent
* `mark_ast_value` marks objects
2. For ripper
* unmarkable node, NODE_RIPPER/NODE_CDECL, holds objects
* call `rb_ast_add_mark_object`. This function calls `rb_hash_aset` then
`RB_OBJ_WRITTEN` is called with `mark_hash` as parent
* `mark_hash` marks objects
However in current pattern_matching implementation
* markable node holds objects
* call `rb_ast_add_mark_object`
This commit fix it to be #2.
This was inconsistency however always `mark_hash` is
made young by `rb_ast_add_mark_object` call then objects
are not collected.
This was introduced by b609bdeb53
to suppress warnings. However these warngins were deleted by
beae6cbf0f. Therefore these codes
are not needed anymore.
If the rescue clause has only exc_var and not exc_list, use the
exc_var position instead of the rescue body position.
This issue appears to have been introduced in
688169fd83 when "opt_list" was split
into "exc_list exc_var".
Fixes [Bug #18974]
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.
This commit adds these methods.
* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.
[Feature #19070]
Impacts on memory usage and performance are below:
Memory usage:
```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb
# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb
# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```
Performance:
```
$ cat ../ast_keep_tokens.yml
prelude: |
src = <<~SRC
module M
class C
def m1(a, b)
1 + a + b
end
end
end
SRC
benchmark:
without_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
with_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)
$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..
| |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens | 21.659k| 21.303k|
| | 1.02x| -|
|with_keep_tokens | 6.220k| 5.691k|
| | 1.09x| -|
```
Assign internal_id to semantic value so that dump parsetree option
can render the tree for these codes without SEGV.
* `def m(&); end`
* `def m(*); end`
* `def m(**); end`
By this change, syntax error is recovered smaller units.
In the case below, "DEFN :bar" is same level with "CLASS :Foo"
now.
```
module Z
class Foo
foo.
end
def bar
end
end
```
[Feature #19013]
"end" after "." or "::" is treated as local variable or method,
see `EXPR_DOT_bit` for detail.
However this "changes" where `bar` method is defined. In the example
below it is not module Z but class Foo.
```
module Z
class Foo
foo.
end
def bar
end
end
```
[Feature #19013]