We are seeing SEGVs in CI:
http://ci.rvm.jp/results/trunk-gc-asserts@ruby-sky1/2253563
This is happening because Ripper constructs AST nodes differently than
parse.y normally does. Specifically in this case Ripper is assigning 3
`VALUE` objects:
1febb6f4a1/parse.y (L757-L761)
Where parse.y will normally assign other things:
1febb6f4a1/parse.y (L11258-L11260)
The important one is the last one, the `struct rb_ary_pattern_info`. The
mark function assumed that `NODE_ARYPTN` have a pointer to `struct
rb_ary_pattern_info`, and used it:
1febb6f4a1/node.c (L1269-L1274)
In the case of Ripper, `NODE_ARYPTN` doesn't point to an
`rb_ary_pattern_info`, so the mark function would SEGV. This commit
changes Ripper so that its `NODE_ARYPTN` nodes also point at an
`rb_ary_pattern_info`, and the mark function can continue with the same
assumption.
Macros can't be expressions, that is a GNU extension (I didn't know
that). This commit converts the macro to a function so that everything
will compile correctly on non-GNU compatible compilers.
This patch changes parse.y to only use `add_mark_object` in Ripper.
Previously we were seeing a bug in write barrier verification. I had
changed `add_mark_object` to execute the write barrier, but the problem
is that we had code like this:
```
NEW_STR(add_mark_object(p, obj), loc)
```
In this case, `add_mark_object` would execute the write barrier between
the ast and `obj`, but the problem is that `obj` isn't actually
reachable from the AST at the time the write barrier executed.
`NEW_STR` can possibly call `malloc` which can kick a GC, and since
`obj` isn't actually reachable from the AST at the time of WB execution,
verification would fail.
Basically the steps were like this:
1. RB_OBJ_WRITTEN via `add_mark_object`
2. Allocate node
3. *Possibly* execute GC via malloc
4. Write obj in to allocated node
This patch changes the steps to:
1. Allocate node
2. *Possibly* execute GC via malloc
3. Write obj in to allocated node
4. RB_OBJ_WRITTEN
and NODE_ZARRAY to NODE_ZLIST.
NODE_ARRAY is used not only by an Array literal, but also the contents
of Hash literals, method call arguments, dynamic string literals, etc.
In addition, the structure of NODE_ARRAY is a linked list, not an array.
This is very confusing, so I believe `NODE_LIST` is a better name.
I guess those AST node were actually used for something, so we'd better
not touch them. Instead this commit just puts the tmpbuffer inside a
different internal struct so that we can mark them.
This commit adds two buckets for allocating NODE structs, then allocates
"markable" NODE objects from one bucket. The reason to do this is so
when the AST mark function scans nodes for VALUE objects to mark, we
only scan NODE objects that we know to reference VALUE objects. If we
*did not* divide the objects, then the mark function spends too much
time scanning objects that don't contain any references.
Now we can reach the ID table buffer from the id table itself, so when
SCOPE nodes are marked we can keep the buffers alive. This eliminates
the need for the "mark array" during normal parse / compile (IOW *not*
Ripper).
This patch changes the AST mark function so that it will walk through
nodes in the NODE buffer marking Ruby objects rather than using a mark
array to guarantee liveness. The reason I want to do this is so that
when compaction happens on major GCs, node objects will have their
references pinned (or possibly we can update them correctly).
This is broken at least since 2.5 (I didn't check earlier versions).
It resulted in failure in test_ast.rb when the tests were added before
the parser change.
Basically, in remove_duplicate_keys, if the node is modified, set
the location information to the previous location information. The
removal of keys should not affect the location in the code.
Previously, **{} was removed by the parser:
```
$ ruby --dump=parse -e '{**{}}'
@ NODE_SCOPE (line: 1, location: (1,0)-(1,6))
+- nd_tbl: (empty)
+- nd_args:
| (null node)
+- nd_body:
@ NODE_HASH (line: 1, location: (1,0)-(1,6))*
+- nd_brace: 1 (hash literal)
+- nd_head:
(null node)
```
Since it was removed by the parser, the compiler did not know
about it, and `m(**{})` was therefore treated as `m()`.
This modifies the parser to not remove the `**{}`. A simple
approach for this is fairly simple by just removing a few
lines from the parser, but that would cause two hash
allocations every time it was used. The approach taken here
modifies both the parser and the compiler, and results in `**{}`
not allocating any hashes in the usual case.
The basic idea is we use a literal node in the parser containing
a frozen empty hash literal. In the compiler, we recognize when
that is used, and if it is the only keyword present, we just
push it onto the VM stack (no creation of a new hash or merging
of keywords). If it is the first keyword present, we push a
new empty hash onto the VM stack, so that later keywords can
merge into it. If it is not the first keyword present, we can
ignore it, since the there is no reason to merge an empty hash
into the existing hash.
Example instructions for `m(**{})`
Before (note ARGS_SIMPLE):
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,7)> (catch: FALSE)
0000 putself ( 1)[Li]
0001 opt_send_without_block <callinfo!mid:m, argc:0, FCALL|ARGS_SIMPLE>, <callcache>
0004 leave
```
After (note putobject and KW_SPLAT):
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,7)> (catch: FALSE)
0000 putself ( 1)[Li]
0001 putobject {}
0003 opt_send_without_block <callinfo!mid:m, argc:1, FCALL|KW_SPLAT>, <callcache>
0006 leave
```
Example instructions for `m(**h, **{})`
Before and After (no change):
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 putself ( 1)[Li]
0001 putspecialobject 1
0003 newhash 0
0005 putself
0006 opt_send_without_block <callinfo!mid:h, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0009 opt_send_without_block <callinfo!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE>, <callcache>
0012 opt_send_without_block <callinfo!mid:m, argc:1, FCALL|KW_SPLAT>, <callcache>
0015 leave
```
Example instructions for `m(**{}, **h)`
Before:
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 putself ( 1)[Li]
0001 putspecialobject 1
0003 newhash 0
0005 putself
0006 opt_send_without_block <callinfo!mid:h, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0009 opt_send_without_block <callinfo!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE>, <callcache>
0012 opt_send_without_block <callinfo!mid:m, argc:1, FCALL|KW_SPLAT>, <callcache>
0015 leave
```
After (basically the same except for the addition of swap):
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 putself ( 1)[Li]
0001 newhash 0
0003 putspecialobject 1
0005 swap
0006 putself
0007 opt_send_without_block <callinfo!mid:h, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0010 opt_send_without_block <callinfo!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE>, <callcache>
0013 opt_send_without_block <callinfo!mid:m, argc:1, FCALL|KW_SPLAT>, <callcache>
0016 leave
```
This reverts the changes to parse.y in
a5b37262524ac39d2af13eea174486370a581c23 as they are not actually
needed and cause the warning for duplicate hash keys to not be
emitted.
The on_params hook will use :nil as the keyword rest argument.
There is a new on_nokw_param hook as well.
This fixes a type issue in the previous code, where an ID was
passed where a VALUE was the declared type. The symbol :nil is
passed instead of the id.
This syntax means the method should be treated as a method that
uses keyword arguments, but no specific keyword arguments are
supported, and therefore calling the method with keyword arguments
will raise an ArgumentError. It is still allowed to double splat
an empty hash when calling the method, as that does not pass
any keyword arguments.
`rb_ast_t` holds a reference to this object, so it should mark the
object. Currently it is relying on the `mark_ary` on `node_buffer` to
ensure that the object stays alive. But since the array internals can
move, this could cause a segv if compaction impacts the array.
Single assignment with rescue modifier applies rescue to the RHS:
a = raise rescue 1 # a = (raise rescue 1)
Previously, multiple assignment with rescue modifier applied rescue
to the entire expression:
a, b = raise rescue [1, 2] # (a, b = raise) rescue [1, 2]
This makes multiple assignment with rescue modifier consistent with
single assignment with rescue modifier, applying rescue to the RHS:
a, b = raise rescue [1, 2] # a, b = (raise rescue [1, 2])
Implements [Feature #8239]
Fixes [Bug #8279]
* parse.y (yycompile): make sure in advance that the `__FILE__`
object shares a fstring, to get rid of dangling path name.
Fixed up 53e9908d8a. [Bug #16041]
* vm_eval.c (eval_make_iseq): ditto.
As a comment token includes the newline, so delayed newline token
just follows it should not be dispatched. [Bug #11485]
Co-Authored-By: Jeremy Evans <code@jeremyevans.net>
* parse.y (value_expr_check): if either of `then` or `else`
statements is not a void value expression, the whole `if` is not
also a void value expression. [Bug #15932]
* string.c (str_replace_shared_without_enc): free previous buffer
before replaced.
* parse.y (gettable): make sure in advance that the `__FILE__`
object shares a fstring, to get rid of replacement with the
fstring later.
TODO: this hack may be needed in other places.
[Bug #15916]
Co-Authored-By: luke-gru (Luke Gruber) <luke.gru@gmail.com>
`(ID)1` was assigned to NODE_ARGS#rest_arg for `{|x,| }`.
This change removes the magic number by introducing an explicit macro
variable for it: NODE_SPECIAL_EXCESSED_COMMA.
Terminate the input from a TTY by 2 ^D at the middle of line, like
as many programs, `cat`, `perl` and so on, do. By the first ^D,
the line will be sent without a newline, and then EOF will be send
by the next ^D.
* parse.y (here_document): broke the terminator condition down
into each piece, the positional condition, resetting the
dedented here-document indentation, and matching identifier.
suppress a false warning by icc.
Heredocs are parsed line-by-line, so we need to keep track of the
temporary encoding of the string. Previously, a heredoc would
only detect mixed encoding errors if they were on the same line,
this changes things so they will be caught on different lines.
Fixes [Bug #15839]
This allows cases such as:
```ruby
m ->(a = ->{@1}) {a}
m.call.call(1)
m2 ->(a: ->{@1}) {a}
m2.call.call(2)
```
Previously, this would cause a syntax error.
[Bug#15789]
Because `proc{|| @1}` is a syntax error, the following should
also be syntax errors:
```ruby
proc { |
| @1}
```
```ruby
proc { |; a| @1 }
```
This fixes both cases.
[Bug #15825]
* parse.y (heredoc_identifier): CR in here-document identifier
might or might not result in a syntax error, by the EOL code.
make a syntax error regardless of the EOL code.
* parse.y (heredoc_identifier): quoted here-document identifier
must end within the same line.
the only corner case that here-document identifier can contain a
newline is that the closing qoute is placed at the beginning of
the next line, and has been warned since 2.4.
```ruby
<<"EOS
" # warning: here document identifier ends with a newline
EOS
```
* parse.y (internal_id): number the ID serial for internal use by
counting down from the neary maximum value, not to accidentally
match permanent IDs.
[Bug #15768]
* parse.y (parser_append_options): explicitly pass $; when auto
splitting, to suppress the warning for non-nil $;.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67608 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* parse.y (rb_parser_fatal): fix "parser" in the message which was
replaced accidentally. it is not the argument name.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67503 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Because hard to specify commits related to r67479 only.
So please commit again.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67499 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* parse.y (rb_parser_fatal): fix "parser" in the message which was
replaced accidentally. it is not the argument name.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67492 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* parse.y: show compile error and the error line separately,
instead of building the error message by snprintf then yyerror.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67358 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* parse.y (parser_token_value_print): in ripper, ID values are
wrapped in NODE_RIPPER at set_yylval_name(), so print the Symbol
wrapped together.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67305 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* parse.y (lambda): support numbered parameters, only when no
argument list including empty parentheses, like empty vertical
bars. [ruby-core:91859] [Bug #15672]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67295 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* parse.y (set_yylval_noname): continue after an invalid global,
instance, class variable name, without "unexpected end-of-file"
error.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67292 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* parse.y (parser_numbered_param): hoisted out the contextual
check for numbered parameters.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67289 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* parse.y (parse_atmark): exclude punctuation follows @ marks,
whereas it is inclusive after $ mark as some punctuation global
variables exist.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67254 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* parse.y (parse_gvar): show the source line erred by invalid
global variable, and indicate the variable including the wrong
punctuation.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67251 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* parse.y (ruby_show_error_line): hoisted out from parser_yyerror.
* parse.y (regx_options): revert r67226 and show the error line
separately.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67250 b2dd03c8-39d4-4d8f-98ff-823fe69b080e