Граф коммитов

113 Коммитов

Автор SHA1 Сообщение Дата
yui-knk 2992e1074a Refactor parser compile functions
Refactor parser compile functions to reduce the dependence
on ruby functions.
This commit includes these changes

1. Refactor `gets`, `input` and `gets_` of `parser_params`

Parser needs two different data structure to get next line, function (`gets`) and input data (`input`).
However `gets_` is used for both function (`call`) and input data (`ptr`).
`call` is used for managing general callback function when `rb_ruby_parser_compile_generic` is used.
`ptr` is used for managing the current pointer on String when `parser_compile_string` is used.
This commit changes parser to used only `gets` and `input` then removes `gets_`.

2. Move parser_compile functions and `gets` functions from parse.y to ruby_parser.c

This change reduces the dependence on ruby functions from parser.

3. Change ruby_parser and ripper to take care of `VALUE input` GC mark

Move the responsibility of calling `rb_gc_mark` for `VALUE input` from parser to ruby_parser and ripper.
`input` is arbitrary data pointer from the viewpoint of parser.

4. Introduce rb_parser_compile_array function

Caller of `rb_parser_compile_generic` needs to take care about GC because ruby_parser doesn’t know
about the detail of `lex_gets` and `input`.
Introduce `rb_parser_compile_array` to reduce the complexity of ast.c.
2024-04-23 07:20:22 +09:00
HASUMI Hitoshi 9b1e97b211 [Universal parser] DeVALUE of p->debug_lines and ast->body.script_lines
This patch is part of universal parser work.

## Summary
- Decouple VALUE from members below:
  - `(struct parser_params *)->debug_lines`
  - `(rb_ast_t *)->body.script_lines`
- Instead, they are now `rb_parser_ary_t *`
  - They can also be a `(VALUE)FIXNUM` as before to hold line count
- `ISEQ_BODY(iseq)->variable.script_lines` remains VALUE
  - In order to do this,
  - Add `VALUE script_lines` param to `rb_iseq_new_with_opt()`
  - Introduce `rb_parser_build_script_lines_from()` to convert `rb_parser_ary_t *` into `VALUE`

## Other details
- Extend `rb_parser_ary_t *`. It previously could only store `rb_parser_ast_token *`, now can store script_lines, too
- Change tactics of building the top-level `SCRIPT_LINES__` in `yycompile0()`
  - Before: While parsing, each line of the script is added to `SCRIPT_LINES__[path]`
  - After: After `yyparse(p)`, `SCRIPT_LINES__[path]` will be built from `p->debug_lines`
- Remove the second parameter of `rb_parser_set_script_lines()` to make it simple
- Introduce `script_lines_free()` to be called from `rb_ast_free()` because the GC no longer takes care of the script_lines
- Introduce `rb_parser_string_deep_copy()` in parse.y to maintain script_lines when `rb_ruby_parser_free()` called
  - With regard to this, please see *Future tasks* below

## Future tasks
- Decouple IMEMO from `rb_ast_t *`
  - This lifts the five-members-restriction of Ruby object,
  - So we will be able to move the ownership of the `lex.string_buffer` from parser to AST
  - Then we remove `rb_parser_string_deep_copy()` to make the whole thing simple
2024-04-15 20:51:54 +09:00
HASUMI Hitoshi f5e387a300 Separate SCRIPT_LINES__ from ast.c
This patch suggests relocating the code dealing with `SCRIPT_LINES__` from ast.c to ruby_parser.c.

## Background

- I guess `AbstractSyntaxTree.of` method used to use `SCRIPT_LINES__` internally for some reason before
- However, now it appears `SCRIPT_LINES__` is no longer used meaningfully by the method
- As evidence of this, (and as my patch shows,) removing the function call of `rb_script_lines_for()` from `ast_s_of()` does not affect the result of `test/ruby/test_ast.rb`

Given the above, I think two possibilities can be considered:

- (A) `AbstractSyntaxTree.of` has not needed `SCRIPT_LINES__` already (I pick this)
- (B) We lack a test case of `AbstractSyntaxTree.of` that needs to use `SCRIPT_LINES__`

## Besides,

The current implementation causes strange behavior:

```console
ruby -e"SCRIPT_LINES__ = {__FILE__ => []}; puts RubyVM::AbstractSyntaxTree.of(->{ 1 + 2 }, keep_script_lines: true).script_lines"
=> `-e:1:in '<main>': undefined method 'script_lines' for nil (NoMethodError)`
```

I think this is a bug because `AbstractSyntaxTree.of` is not supposed to return `nil` even in this case.
This happens due to the ast.c's dependence on `SCRIPT_LINES__`.
And at the end of the `ast_s_of()`, `node_find()` can not find the target child node obviously because it doesn't make sense to look for a corresponding node made from the parameter of `AbstractSyntaxTree.of` in the AST tree made from the value of `{__FILE__ => []}`

## Solution

Since I think it's good enough `SCRIPT_LINES__` to be only referred by ruby.c, I chose the possibility "(A)" and wrote this patch which moves `rb_script_lines_for()` from ast.c to ruby_parser.c.

So as the result:

- `ast_s_of()` function no longer look up `SCRIPT_LINES__`
- Even so, this patched code passes the existing tests
- The strange behavior above no longer happens (I also added a test for it)

Please correct me if I miss something🙏
2024-04-04 18:29:16 +09:00
yui-knk f057741c5d NODE_LIT is not used anymore 2024-04-04 13:17:26 +09:00
HASUMI Hitoshi 9a19cfd4cd [Universal Parser] Reduce dependence on RArray in parse.y
- Introduce `rb_parser_ary_t` structure to partly eliminate RArray from parse.y
  - In this patch, `parser_params->tokens` and `parser_params->ast->node_buffer->tokens` are now `rb_parser_ary_t *`
  - Instead, `ast_node_all_tokens()` internally creates a Ruby Array object from the `rb_parser_ary_t`
  - Also, delete `rb_ast_tokens()` and `rb_ast_set_tokens()` in node.c

- Implement `rb_parser_str_escape()`
  - This is a port of the `rb_str_escape()` function in string.c
  - `rb_parser_str_escape()` does not depend on `VALUE` (RString)
  - Instead, it uses `rb_parser_stirng_t *`
  - This function works when --dump=y option passed

- Because WIP of the universal parser, similar functions like `rb_parser_tokens_free()` exist in both node.c and parse.y. Refactoring them may be needed in some way in the future

- Although we considered redesigning the structure: `ast->node_buffer->tokens` into `ast->tokens`, we leave it as it is because `rb_ast_t` is an imemo. (We will address it in the future)
2024-03-12 17:17:52 +09:00
Kevin Newton 82a4c3af16 Add error for iseqs compiled by prism 2024-02-21 11:44:40 -05:00
yui-knk e7ab5d891c Introduce NODE_REGX to manage regexp literal 2024-02-21 08:06:48 +09:00
yui-knk 89cfc15207 [Feature #20257] Rearchitect Ripper
Introduce another semantic value stack for Ripper so that
Ripper can manage both Node and Ruby Object separately.
This rearchitectutre of Ripper solves these issues.
Therefore adding test cases for them.

* [Bug 10436] https://bugs.ruby-lang.org/issues/10436
* [Bug 18988] https://bugs.ruby-lang.org/issues/18988
* [Bug 20055] https://bugs.ruby-lang.org/issues/20055

Checked the differences of `Ripper.sexp` for files under `/test/ruby`
are only on test_pattern_matching.rb.
The differences comes from the differences between
`new_hash_pattern_tail` functions between parser and Ripper.
Ripper `new_hash_pattern_tail` didn’t call `assignable` then
`kw_rest_arg` wasn’t marked as local variable.
This is also fixed by this commit.

```
--- a/./tmp/before/test_pattern_matching.rb
+++ b/./tmp/after/test_pattern_matching.rb
@@ -3607,7 +3607,7 @@
                  [:in,
                   [:hshptn, nil, [], [:var_field, [:@ident, “a”, [984, 13]]]],
                   [[:binary,
-                    [:vcall, [:@ident, “a”, [985, 10]]],
+                    [:var_ref, [:@ident, “a”, [985, 10]]],
                     :==,
                     [:hash, nil]]],
                   nil]]],
@@ -3662,7 +3662,7 @@
                  [:in,
                   [:hshptn, nil, [], [:var_field, [:@ident, “a”, [993, 13]]]],
                   [[:binary,
-                    [:vcall, [:@ident, “a”, [994, 10]]],
+                    [:var_ref, [:@ident, “a”, [994, 10]]],
                     :==,
                     [:hash,
                      [:assoclist_from_args,
@@ -3813,7 +3813,7 @@
                    [:command,
                     [:@ident, “raise”, [1022, 10]],
                     [:args_add_block,
-                     [[:vcall, [:@ident, “b”, [1022, 16]]]],
+                     [[:var_ref, [:@ident, “b”, [1022, 16]]]],
                      false]]],
                   [:else, [[:var_ref, [:@kw, “true”, [1024, 10]]]]]]]],
                nil,
@@ -3876,7 +3876,7 @@
                      [:@int, “0”, [1033, 15]]],
                     :“&&“,
                     [:binary,
-                     [:vcall, [:@ident, “b”, [1033, 20]]],
+                     [:var_ref, [:@ident, “b”, [1033, 20]]],
                      :==,
                      [:hash, nil]]]],
                   nil]]],
@@ -3946,7 +3946,7 @@
                      [:@int, “0”, [1042, 15]]],
                     :“&&“,
                     [:binary,
-                     [:vcall, [:@ident, “b”, [1042, 20]]],
+                     [:var_ref, [:@ident, “b”, [1042, 20]]],
                      :==,
                      [:hash,
                       [:assoclist_from_args,
@@ -5206,7 +5206,7 @@
                      [[:assoc_new,
                        [:@label, “c:“, [1352, 22]],
                        [:@int, “0”, [1352, 25]]]]]],
-                   [:vcall, [:@ident, “r”, [1352, 29]]]],
+                   [:var_ref, [:@ident, “r”, [1352, 29]]]],
                   false]]],
                [:binary,
                 [:call,
@@ -5299,7 +5299,7 @@
                       [:assoc_new,
                        [:@label, “c:“, [1367, 34]],
                        [:@int, “0”, [1367, 37]]]]]],
-                   [:vcall, [:@ident, “r”, [1367, 41]]]],
+                   [:var_ref, [:@ident, “r”, [1367, 41]]]],
                   false]]],
                [:binary,
                 [:call,
@@ -5931,7 +5931,7 @@
              [:in,
               [:hshptn, nil, [], [:var_field, [:@ident, “r”, [1533, 11]]]],
               [[:binary,
-                [:vcall, [:@ident, “r”, [1534, 8]]],
+                [:var_ref, [:@ident, “r”, [1534, 8]]],
                 :==,
                 [:hash,
                  [:assoclist_from_args,
```
2024-02-20 17:33:58 +09:00
yui-knk 33c1e082d0 Remove ruby object from string nodes
String nodes holds ruby string object on `VALUE nd_lit`.
This commit changes it to `struct rb_parser_string *string`
to reduce dependency on ruby object.
Sometimes these strings are concatenated with other string
therefore string concatenate functions are needed.
2024-02-09 14:20:17 +09:00
Nobuyoshi Nakada e018036d89
Rename `nd_head` in `RNode_RESBODY` as `nd_next` 2024-01-28 11:12:22 +09:00
S.H 9b40f42c22
Introduce `NODE_ENCODING`
`__ENCODING__ `was managed by `NODE_LIT` with Encoding object. 

Introduce `NODE_ENCODING` for
1. `__ENCODING__` is detectable from AST Node.
2. Reduce dependency Ruby object for parse.y
2024-01-27 08:11:10 +00:00
yui-knk db476cc71c Introduce NODE_SYM to manage symbol literal
`:sym` was managed by `NODE_LIT` with `Symbol` object.
This commit introduces `NODE_SYM` so that

1. Symbol literal is detectable from AST Node
2. Reduce dependency on ruby object
2024-01-09 16:07:19 +09:00
yui-knk 7ffff3e043 Change numeric node value functions argument to `NODE *`
Change the argument to align with other node value functions
like `rb_node_line_lineno_val`.
2024-01-08 14:02:48 +09:00
S-H-GAMELINKS 1b8d01136c Introduce Numeric Node's 2024-01-07 09:24:34 +09:00
yui-knk 7a050638b1 Introduce NODE_FILE
`__FILE__` was managed by `NODE_STR` with `String` object.
This commit introduces `NODE_FILE` and `struct rb_parser_string` so that

1. `__FILE__` is detectable from AST Node
2. Reduce dependency ruby object
2024-01-02 14:19:42 +09:00
yui-knk 1ade170a6c Introduce NODE_LINE
`__LINE__` was managed by `NODE_LIT` with `Integer` object.
This commit introduces `NODE_LINE` so that

1. `__LINE__` is detectable from AST Node
2. Reduce dependency ruby object
2023-12-29 18:32:27 +09:00
Nobuyoshi Nakada 45eee0cd94
Remove duplicate to_path conversion
`rb_file_open_str` calls `FilePathValue`, and the converted result is
not used in this function.
2023-11-02 10:06:03 +09:00
Nobuyoshi Nakada 13c9cbe09e
Embed `rb_args_info` in `rb_node_args_t` 2023-10-30 00:19:43 +09:00
yui-knk 08e25985d1 Expand OP_ASGN1 nd_args to nd_index and nd_rvalue
ARGSCAT has been used for nd_args to hold index and rvalue,
because there was limitation on the number of members for Node.
We can easily change structure of node now, let's expand it.
2023-10-20 07:56:20 +09:00
yui-knk 3049b5e348 Differentiate VAR nodes 2023-10-09 13:33:36 +09:00
yui-knk 09b33ea15a Differentiate CALL nodes 2023-10-09 13:33:36 +09:00
yui-knk 529a651f82 Differentiate ASGN nodes 2023-10-07 17:54:35 +09:00
yui-knk f28d380374 Pass nd_value to NODE_REQUIRED_KEYWORD_P 2023-10-07 17:54:35 +09:00
Nobuyoshi Nakada a5cc6341c0
Remove `NODE_VALUES`
This node type was added for the multi-value experiment back in 2004.
The feature itself was removed after a few years, but this is its
remnant.
2023-10-06 03:39:58 +09:00
Nobuyoshi Nakada 696022a0cb Differentiate `NODE_BREAK`/`NODE_NEXT`/`NODE_RETURN` 2023-10-05 14:23:42 +09:00
Nobuyoshi Nakada 70e1635950 Move internal NODE_DEF_TEMP to parse.y 2023-10-05 14:23:42 +09:00
yui-knk 08239fd6af Use rb_node_args_t and rb_node_args_aux_t instead of NODE 2023-10-01 19:38:03 +09:00
yui-knk cecd1de2eb Use rb_node_opt_arg_t and rb_node_kw_arg_t instead of NODE 2023-10-01 09:19:42 +09:00
yui-knk d293d9e191 Expand pattern_info struct into ARYPTN Node and FNDPTN Node 2023-09-30 13:11:32 +09:00
yui-knk 68ae87546e Merge NODE_DEF_TEMP and NODE_DEF_TEMP2 2023-09-29 19:36:34 +09:00
yui-knk 37a783a30c Merge RNode_OP_ASGN2 and RNode_OP_ASGN22 2023-09-29 08:36:39 +09:00
yui-knk 74c6781153 Change RNode structure from union to struct
All kind of AST nodes use same struct RNode, which has u1, u2, u3 union members
for holding different kind of data.
This has two problems.

1. Low flexibility of data structure

Some nodes, for example NODE_TRUE, don’t use u1, u2, u3. On the other hand,
NODE_OP_ASGN2 needs more than three union members. However they use same
structure definition, need to allocate three union members for NODE_TRUE and
need to separate NODE_OP_ASGN2 into another node.
This change removes the restriction so make it possible to
change data structure by each node type.

2. No compile time check for union member access

It’s developer’s responsibility for using correct member for each node type when it’s union.
This change clarifies which node has which type of fields and enables compile time check.

This commit also changes node_buffer_elem_struct buf management to handle
different size data with alignment.
2023-09-28 11:58:10 +09:00
Nobuyoshi Nakada 6aa16f9ec1 Move SCRIPT_LINES__ away from parse.y 2023-08-25 18:23:05 +09:00
yui-knk b481b673d7 [Feature #19719] Universal Parser
Introduce Universal Parser mode for the parser.
This commit includes these changes:

* Introduce `UNIVERSAL_PARSER` macro. All of CRuby related functions
  are passed via `struct rb_parser_config_struct` when this macro is enabled.
* Add CI task with 'cppflags=-DUNIVERSAL_PARSER' for ubuntu.
2023-06-12 18:23:48 +09:00
yui-knk 5f65e8c5d5 Rename `rb_node_name` to the original name
98637d421d changes the name of
the function. However this function is exported as global,
then change the name to origin one for keeping compatibility.
2023-05-24 20:54:48 +09:00
yui-knk 98637d421d Move `ruby_node_name` to node.c and rename prefix of the function 2023-05-23 18:05:35 +09:00
Nobuyoshi Nakada 2490b2e121 Add utility macros `DECIMAL_SIZE_OF` and `DECIMAL_SIZE_OF_BYTES` 2023-02-14 15:18:21 +09:00
yui-knk 979dd02e2f Check if the argument is Thread::Backtrace::Location object
[Bug #19262]
2023-01-06 09:22:09 +09:00
yui-knk d8601621ed Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.

This commit adds these methods.

* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.

[Feature #19070]

Impacts on memory usage and performance are below:

Memory usage:

```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)

$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb

# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb

# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```

Performance:

```
$ cat ../ast_keep_tokens.yml
prelude: |
  src = <<~SRC
    module M
      class C
        def m1(a, b)
          1 + a + b
        end
      end
    end
  SRC
benchmark:
  without_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
  with_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)

$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..

|                     |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens  |     21.659k|   21.303k|
|                     |       1.02x|         -|
|with_keep_tokens     |      6.220k|    5.691k|
|                     |       1.09x|         -|
```
2022-11-21 09:01:34 +09:00
eileencodes 3391c51eff Add `node_id_for_backtrace_location` function
We want to use error highlight with eval'd code, specifically ERB
templates. We're able to recover the generated code for eval'd templates
and can get a parse tree for the ERB generated code, but we don't have a
way to get the node id from the backtrace location. So we can't pass the
right node into error highlight.

This patch gives us an API to get the node id from the backtrace
location so we can find the node in the AST.

Error Highlight PR: https://github.com/ruby/error_highlight/pull/26

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2022-10-31 13:39:56 +09:00
yui-knk 4bfdf6d06d Move `error` from top_stmts and top_stmt to stmt
By this change, syntax error is recovered smaller units.
In the case below, "DEFN :bar" is same level with "CLASS :Foo"
now.

```
module Z
  class Foo
    foo.
  end

  def bar
  end
end
```

[Feature #19013]
2022-10-08 17:59:11 +09:00
yui-knk fbbdbdd891 Add error_tolerant option to RubyVM::AST
If this option is enabled, SyntaxError is not raised and Node is
returned even if passed script is broken.

[Feature #19013]
2022-10-08 17:59:11 +09:00
Peter Zhu 5f10bd634f Add ISEQ_BODY macro
Use ISEQ_BODY macro to get the rb_iseq_constant_body of the ISeq. Using
this macro will make it easier for us to change the allocation strategy
of rb_iseq_constant_body when using Variable Width Allocation.
2022-03-24 10:03:51 -04:00
Yusuke Endoh 0dc7816c43 Make RubyVM::AST.of work with code written in `-e` command-line option
[Bug #18434]
2021-12-26 20:57:34 +09:00
Yusuke Endoh 6a51c3e80c Make AST.of possible even under eval when keep_script_lines is enabled
Now the following code works without an exception.

```
RubyVM.keep_script_lines = true

eval(<<END)
def foo
end
END

p RubyVM::AbstractSyntaxTree.of(method(:foo))
```
2021-12-19 04:00:51 +09:00
Yusuke Endoh acac2b8128 Make RubyVM::AbstractSyntaxTree.of raise for backtrace location in eval
This check is needed to fix a bug of error_highlight when NameError
occurred in eval'ed code.
https://github.com/ruby/error_highlight/pull/16

The same check for proc/method has been already introduced since
64ac984129.
2021-12-19 03:51:37 +09:00
Nobuyoshi Nakada 54f0e63a8c Remove `NODE_DASGN_CURR` [Feature #18406]
This `NODE` type was used in pre-YARV implementation, to improve
the performance of assignment to dynamic local variable defined at
the innermost scope.  It has no longer any actual difference with
`NODE_DASGN`, except for the node dump.
2021-12-13 12:53:03 +09:00
S.H ec7f14d9fa
Add `nd_type_p` macro 2021-12-04 00:01:24 +09:00
Yusuke Endoh feda058531 Refactor hacky ID tables to struct rb_ast_id_table_t
The implementation of a local variable tables was represented as `ID*`,
but it was very hacky: the first element is not an ID but the size of
the table, and, the last element is (sometimes) a link to the next local
table only when the id tables are a linked list.

This change converts the hacky implementation to a normal struct.
2021-11-21 08:59:24 +09:00
Yusuke Endoh 09fa773e04
ast.c: Use kept script_lines data instead of re-opening the source file (#5019)
ast.c: Use kept script_lines data instead of re-open the source file
2021-10-26 01:58:01 +09:00