Previously we used the next character following the found prefix to
determine if the match ended on a broken character.
This had caused surprising behaviour when a valid character was followed
by a UTF-8 continuation byte.
This commit changes the behaviour to instead look for the end of the
last character in the prefix.
[Bug #19784]
Co-authored-by: ywenc <ywenc@github.com>
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
The test sometimes fails with:
```
1) Failure:
TestProcess#test_warmup_frees_pages [test/ruby/test_process.rb:2749]:
<0> expected but was
<1>.
```
I think there's a page with an object that needs finalization, so run
GC to clear that object.
The test sometimes fails with:
```
1) Failure:
TestProcess#test_warmup_run_major_gc_and_compact [test/ruby/test_process.rb:2712]:
<2> expected but was
<3>.
```
Essentially, this change updates `yp_unescape_calculate_difference` to
not create syntax errors, and we rely entirely on
`yp_unescape_manipulate_string` to report syntax errors.
To do that, this PR adds another (!) parameter to `unescape`:
`yp_list_t *error_list`. When present, `unescape` reports syntax
errors (and otherwise does not).
However, an edge case that needed to be addressed is reporting syntax
errors in this case:
?\u{1234 2345}
In a string context, it's possible to have multiple codepoints by
doing something like `"\u{1234 2345}"`; however, in the character
literal context, this is a syntax error -- only a single codepoint is
allowed.
Unfortunately, when `yp_unescape_manipulate_string` is called, there's
nothing to indicate that we are in a "character literal" context and
that only a single codepoint is valid.
To make this work, this PR:
- introduces a new static utility function in yarp.c,
`yp_char_literal_node_create_and_unescape`, which is called when
we're parsing `YP_TOKEN_CHARACTER_LITERAL`
- introduces a new (unexported) function,
`yp_unescape_manipulate_char_literal` which does the same thing as
`yp_unescape_manipulate_string` but tells `unescape` that only a
single codepoint is expected
https://github.com/ruby/yarp/commit/f6a65840b5
We move all pooled pages to free pages at the start of incremental
marking, so we shouldn't run incremental marking only when we have run
out of free pages. This causes incremental marking to always complete
in a single step.
Popped was slightly inaccurate for ConstantNodes and leading to issues
if there was content after a ConstantNode. This fix doesn't pop
any ConstantWriteNode values.
If we are in a minor GC and the object to mark is old, then the old
object should already be marked and cannot be reclaimed in this GC cycle
so we don't need to add it to the weak refences list.
* Consistent with ClassVariableWriteNode, ConstantWriteNode, InstanceVariableWriteNode, GlobalVariableWriteNode.
* Fixes desugaring of local variable with operators.
https://github.com/ruby/yarp/commit/9a66737775
* Consistent with ClassVariableWriteNode, ConstantWriteNode, InstanceVariableWriteNode, LocalVariableWriteNode.
* Fixes desugaring of global variable with operators.
https://github.com/ruby/yarp/commit/fb5a53fc0b
file
(https://github.com/ruby/yarp/pull/1371)
* refactor: move EOF check into yp_unescape_calculate_difference
parser_lex is a bit more readable when we can rely on that behavior
* fix: octal and hex digits at the end of a file
Previously this resulted in invalid memory access.
* fix: unicode strings at the end of a file
Previously this resulted in invalid memory access.
* Unterminated curly-bracket unicode is a syntax error
https://github.com/ruby/yarp/commit/21cf11acb5
This is an internal only function not exposed to the C extension API.
It's only use so far is from rb_vm_mark, where it's used to mark the
values in the vm->trap_list.cmd array.
There shouldn't be any reason why these cannot move.
This commit allows them to move by updating their references during the
reference updating step of compaction.
To do this we've introduced another internal function
rb_gc_update_values as a partner to rb_gc_mark_values.
This allows us to refactor rb_gc_mark_values to not pin