The allocated parser string is never freed, which causes a memory leak.
The following code leaks memory:
Ripper.sexp_raw(DATA.read)
__END__
<<~EOF
a
#{1}
a
EOF
This change is reduce Ruby C API dependency for Universal Parser.
Reuse dedent_string functions in rb_ruby_ripper_dedent_string functions and remove dependencies on rb_str_modify and rb_str_set_len from the parser.
This commit switches the default parser to Prism. There are a
couple of additional changes related to this that are a part of
this as well to make this happen.
* Switch the default parser in parse.h
* Remove the Prism-specific workflow and add a parse.y-specific
workflow to CI so that it continues to be tested
* Update a few test exclusions since Prism has the correct
behavior but parse.y doesn't per
https://bugs.ruby-lang.org/issues/20504.
* Skips a couple of tests on RBS which are failing because they
are using RubyVM::AbstractSyntaxTree.of.
Fixes [Feature #20564]
Recently, `TestRubyLiteral#test_float` fails randomly.
```
1) Error:
TestRubyLiteral#test_float:
ArgumentError: SyntaxError#path changed: "(eval at /home/chkbuild/chkbuild/tmp/build/20240527T050036Z/ruby/test/ruby/test_literal.rb:642)"->"(eval at /home/chkbuild/chkbuild/tmp/build/20240527T050036Z/ruby/test/ruby/test_literal.rb:642)"
```
https://rubyci.s3.amazonaws.com/s390x/ruby-master/log/20240527T050036Z.fail.html.gz
According to Launchable, the first failure was on Apr 30.
This is just when 528c4501f4 was
committed. I don't know if the change is really the cause, but I want to
revert it once to see if the random failure disappears.
Refactor parser compile functions to reduce the dependence
on ruby functions.
This commit includes these changes
1. Refactor `gets`, `input` and `gets_` of `parser_params`
Parser needs two different data structure to get next line, function (`gets`) and input data (`input`).
However `gets_` is used for both function (`call`) and input data (`ptr`).
`call` is used for managing general callback function when `rb_ruby_parser_compile_generic` is used.
`ptr` is used for managing the current pointer on String when `parser_compile_string` is used.
This commit changes parser to used only `gets` and `input` then removes `gets_`.
2. Move parser_compile functions and `gets` functions from parse.y to ruby_parser.c
This change reduces the dependence on ruby functions from parser.
3. Change ruby_parser and ripper to take care of `VALUE input` GC mark
Move the responsibility of calling `rb_gc_mark` for `VALUE input` from parser to ruby_parser and ripper.
`input` is arbitrary data pointer from the viewpoint of parser.
4. Introduce rb_parser_compile_array function
Caller of `rb_parser_compile_generic` needs to take care about GC because ruby_parser doesn’t know
about the detail of `lex_gets` and `input`.
Introduce `rb_parser_compile_array` to reduce the complexity of ast.c.
This patch is part of universal parser work.
## Summary
- Decouple VALUE from members below:
- `(struct parser_params *)->debug_lines`
- `(rb_ast_t *)->body.script_lines`
- Instead, they are now `rb_parser_ary_t *`
- They can also be a `(VALUE)FIXNUM` as before to hold line count
- `ISEQ_BODY(iseq)->variable.script_lines` remains VALUE
- In order to do this,
- Add `VALUE script_lines` param to `rb_iseq_new_with_opt()`
- Introduce `rb_parser_build_script_lines_from()` to convert `rb_parser_ary_t *` into `VALUE`
## Other details
- Extend `rb_parser_ary_t *`. It previously could only store `rb_parser_ast_token *`, now can store script_lines, too
- Change tactics of building the top-level `SCRIPT_LINES__` in `yycompile0()`
- Before: While parsing, each line of the script is added to `SCRIPT_LINES__[path]`
- After: After `yyparse(p)`, `SCRIPT_LINES__[path]` will be built from `p->debug_lines`
- Remove the second parameter of `rb_parser_set_script_lines()` to make it simple
- Introduce `script_lines_free()` to be called from `rb_ast_free()` because the GC no longer takes care of the script_lines
- Introduce `rb_parser_string_deep_copy()` in parse.y to maintain script_lines when `rb_ruby_parser_free()` called
- With regard to this, please see *Future tasks* below
## Future tasks
- Decouple IMEMO from `rb_ast_t *`
- This lifts the five-members-restriction of Ruby object,
- So we will be able to move the ownership of the `lex.string_buffer` from parser to AST
- Then we remove `rb_parser_string_deep_copy()` to make the whole thing simple
Need to separate `check_literal_when` function for parser and
ripper otherwise warning event is not dispatched because
parser `rb_warning1` is used in ripper.
This commit simplifies warnings for hash keys duplication and when clause duplication,
based on the discussion of https://bugs.ruby-lang.org/issues/20331.
Warnings are reported only when strings are same to ohters.
This commit changes `struct parser_params` lastline and nextline
from `VALUE` (String object) to `rb_parser_string_t *` so that
dependency on Ruby Object is reduced.
`parser_string_buffer_t string_buffer` is added to `struct parser_params`
to manage `rb_parser_string_t` pointers of each line. All allocated line
strings are freed in `rb_ruby_parser_free`.
Introduce Universal Parser mode for the parser.
This commit includes these changes:
* Introduce `UNIVERSAL_PARSER` macro. All of CRuby related functions
are passed via `struct rb_parser_config_struct` when this macro is enabled.
* Add CI task with 'cppflags=-DUNIVERSAL_PARSER' for ubuntu.
This reverts commit 35136e1e9c.
test-spec has been failing since this revision.
.github/workflows/compilers.yml:82
https://github.com/ruby/ruby/actions/runs/4276884159/jobs/7445299562
```
env:
# Minimal flags to pass the check.
default_cc: 'gcc-11 -fcf-protection -Wa,--generate-missing-build-notes=yes'
optflags: '-O2'
LDFLAGS: '-Wl,-z,now'
# FIXME: Drop skipping options
# https://bugs.ruby-lang.org/issues/18061
# https://sourceware.org/annobin/annobin.html/Test-pie.html
TEST_ANNOCHECK_OPTS: "--skip-pie --skip-gaps"
```
Failure:
```
1)
An exception occurred during: Kernel#require (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:317
Kernel#require (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded ERROR
LeakError: Leaked file descriptor: 8 : #<File:/__w/ruby/ruby/src/spec/ruby/fixtures/code/load_fixture.ext.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:5:in `<top (required)>'
2)
An exception occurred during: Kernel#require ($LOADED_FEATURES) stores an absolute path
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:330
Kernel#require ($LOADED_FEATURES) stores an absolute path ERROR
LeakError: Closed file descriptor: 8
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:5:in `<top (required)>'
3)
An exception occurred during: Kernel#require ($LOADED_FEATURES) does not load a non-canonical path for a file already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:535
Kernel#require ($LOADED_FEATURES) does not load a non-canonical path for a file already loaded ERROR
LeakError: Leaked file descriptor: 8 : #<File:/__w/ruby/ruby/src/spec/ruby/fixtures/code/load_fixture.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:5:in `<top (required)>'
4)
An exception occurred during: Kernel#require ($LOADED_FEATURES) does not load a ../ relative path for a file already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:551
Kernel#require ($LOADED_FEATURES) does not load a ../ relative path for a file already loaded ERROR
LeakError: Leaked file descriptor: 9 : #<File:../code/load_fixture.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:5:in `<top (required)>'
5)
An exception occurred during: Kernel#require ($LOADED_FEATURES) complex, enumerator, rational, thread, ruby2_keywords are already required
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:563
Kernel#require ($LOADED_FEATURES) complex, enumerator, rational, thread, ruby2_keywords are already required ERROR
LeakError: Closed file descriptor: 8
Closed file descriptor: 9
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:5:in `<top (required)>'
6)
An exception occurred during: Kernel.require (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:317
Kernel.require (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded ERROR
LeakError: Leaked file descriptor: 8 : #<File:/__w/ruby/ruby/src/spec/ruby/fixtures/code/load_fixture.ext.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:23:in `<top (required)>'
7)
An exception occurred during: Kernel.require ($LOADED_FEATURES) stores an absolute path
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:330
Kernel.require ($LOADED_FEATURES) stores an absolute path ERROR
LeakError: Closed file descriptor: 8
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:23:in `<top (required)>'
8)
An exception occurred during: Kernel.require ($LOADED_FEATURES) does not load a non-canonical path for a file already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:535
Kernel.require ($LOADED_FEATURES) does not load a non-canonical path for a file already loaded ERROR
LeakError: Leaked file descriptor: 8 : #<File:/__w/ruby/ruby/src/spec/ruby/fixtures/code/load_fixture.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:23:in `<top (required)>'
9)
An exception occurred during: Kernel.require ($LOADED_FEATURES) does not load a ../ relative path for a file already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:551
Kernel.require ($LOADED_FEATURES) does not load a ../ relative path for a file already loaded ERROR
LeakError: Leaked file descriptor: 9 : #<File:../code/load_fixture.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:23:in `<top (required)>'
10)
An exception occurred during: Kernel.require ($LOADED_FEATURES) complex, enumerator, rational, thread, ruby2_keywords are already required
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:563
Kernel.require ($LOADED_FEATURES) complex, enumerator, rational, thread, ruby2_keywords are already required ERROR
LeakError: Closed file descriptor: 8
Closed file descriptor: 9
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:23:in `<top (required)>'
11)
An exception occurred during: Kernel#require_relative with a relative path (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:197
Kernel#require_relative with a relative path (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded ERROR
LeakError: Leaked file descriptor: 8 : #<File:/__w/ruby/ruby/src/spec/ruby/fixtures/code/load_fixture.ext.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:4:in `<top (required)>'
12)
An exception occurred during: Kernel#require_relative with a relative path ($LOADED_FEATURES) stores an absolute path
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:205
Kernel#require_relative with a relative path ($LOADED_FEATURES) stores an absolute path ERROR
LeakError: Closed file descriptor: 8
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:4:in `<top (required)>'
13)
An exception occurred during: Kernel#require_relative with an absolute path (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:399
Kernel#require_relative with an absolute path (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded ERROR
LeakError: Leaked file descriptor: 8 : #<File:/__w/ruby/ruby/src/spec/ruby/fixtures/code/load_fixture.ext.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:277:in `<top (required)>'
14)
An exception occurred during: Kernel#require_relative with an absolute path ($LOAD_FEATURES) stores an absolute path
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:407
Kernel#require_relative with an absolute path ($LOAD_FEATURES) stores an absolute path ERROR
LeakError: Closed file descriptor: 8
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:277:in `<top (required)>'
```
When loading Ruby source files, we can save the result of
successful opens as open(2)/openat(2) are a fairly expensive
syscalls. This also avoids a time-of-check-to-time-of-use
(TOCTTOU) problem.
This reduces open(2) syscalls during `require'; but should be
most apparent when users have a small $LOAD_PATH. Users with
large $LOAD_PATH will benefit less since there'll be more
open(2) failures due to ENOENT.
With `strace -c -e openat ruby -e exit' under Linux, this
results in a ~14% reduction of openat(2) syscalls
(glibc uses openat(2) to implement open(2)).
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 296 110 openat
0.00 0.000000 0 254 110 openat
Additionally, the introduction of `struct ruby_file_load_state'
may make future optimizations more apparent.
This change cannot benefit binary (.so) loading since the
dlopen(3) API requires a filename and I'm not aware of an
alternative that takes a pre-existing FD. In typical
situations, Ruby source files outnumber the mount of .so
files.
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.
This commit adds these methods.
* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.
[Feature #19070]
Impacts on memory usage and performance are below:
Memory usage:
```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb
# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb
# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```
Performance:
```
$ cat ../ast_keep_tokens.yml
prelude: |
src = <<~SRC
module M
class C
def m1(a, b)
1 + a + b
end
end
end
SRC
benchmark:
without_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
with_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)
$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..
| |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens | 21.659k| 21.303k|
| | 1.02x| -|
|with_keep_tokens | 6.220k| 5.691k|
| | 1.09x| -|
```
... as per ko1's preference. He is preparing to extend this feature to
ISeq for his new debugger. He prefers "keep" to "save" for this wording.
This API is internal and not included in any released version, so I
change it in advance.
This option makes the parser keep the original source as an array of
the original code lines. This feature exploits the mechanism of
`SCRIPT_LINES__` but records only the specified code that is passed to
RubyVM::AST.of or .parse, instead of recording all parsed program texts.
According to MSVC manual (*1), cl.exe can skip including a header file
when that:
- contains #pragma once, or
- starts with #ifndef, or
- starts with #if ! defined.
GCC has a similar trick (*2), but it acts more stricter (e. g. there
must be _no tokens_ outside of #ifndef...#endif).
Sun C lacked #pragma once for a looong time. Oracle Developer Studio
12.5 finally implemented it, but we cannot assume such recent version.
This changeset modifies header files so that each of them include
strictly one #ifndef...#endif. I believe this is the most portable way
to trigger compiler optimizations. [Bug #16770]
*1: https://docs.microsoft.com/en-us/cpp/preprocessor/once
*2: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead. This would significantly
speed up incremental builds.
We take the following inclusion order in this changeset:
1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very
first thing among everything).
2. RUBY_EXTCONF_H if any.
3. Standard C headers, sorted alphabetically.
4. Other system headers, maybe guarded by #ifdef
5. Everything else, sorted alphabetically.
Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
These headers need no rewrite. Just add some minor tweaks, like
addition of #include lines. Mainly cosmetic.
TIMET_MAX_PLUS_ONE was deleted because the macro was used from only
one place (directly write expression there).
One day, I could not resist the way it was written. I finally started
to make the code clean. This changeset is the beginning of a series of
housekeeping commits. It is a simple refactoring; split internal.h into
files, so that we can divide and concur in the upcoming commits. No
lines of codes are either added or removed, except the obvious file
headers/footers. The generated binary is identical to the one before.