Introduce Universal Parser mode for the parser.
This commit includes these changes:
* Introduce `UNIVERSAL_PARSER` macro. All of CRuby related functions
are passed via `struct rb_parser_config_struct` when this macro is enabled.
* Add CI task with 'cppflags=-DUNIVERSAL_PARSER' for ubuntu.
This reverts commit 35136e1e9c.
test-spec has been failing since this revision.
.github/workflows/compilers.yml:82
https://github.com/ruby/ruby/actions/runs/4276884159/jobs/7445299562
```
env:
# Minimal flags to pass the check.
default_cc: 'gcc-11 -fcf-protection -Wa,--generate-missing-build-notes=yes'
optflags: '-O2'
LDFLAGS: '-Wl,-z,now'
# FIXME: Drop skipping options
# https://bugs.ruby-lang.org/issues/18061
# https://sourceware.org/annobin/annobin.html/Test-pie.html
TEST_ANNOCHECK_OPTS: "--skip-pie --skip-gaps"
```
Failure:
```
1)
An exception occurred during: Kernel#require (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:317
Kernel#require (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded ERROR
LeakError: Leaked file descriptor: 8 : #<File:/__w/ruby/ruby/src/spec/ruby/fixtures/code/load_fixture.ext.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:5:in `<top (required)>'
2)
An exception occurred during: Kernel#require ($LOADED_FEATURES) stores an absolute path
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:330
Kernel#require ($LOADED_FEATURES) stores an absolute path ERROR
LeakError: Closed file descriptor: 8
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:5:in `<top (required)>'
3)
An exception occurred during: Kernel#require ($LOADED_FEATURES) does not load a non-canonical path for a file already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:535
Kernel#require ($LOADED_FEATURES) does not load a non-canonical path for a file already loaded ERROR
LeakError: Leaked file descriptor: 8 : #<File:/__w/ruby/ruby/src/spec/ruby/fixtures/code/load_fixture.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:5:in `<top (required)>'
4)
An exception occurred during: Kernel#require ($LOADED_FEATURES) does not load a ../ relative path for a file already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:551
Kernel#require ($LOADED_FEATURES) does not load a ../ relative path for a file already loaded ERROR
LeakError: Leaked file descriptor: 9 : #<File:../code/load_fixture.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:5:in `<top (required)>'
5)
An exception occurred during: Kernel#require ($LOADED_FEATURES) complex, enumerator, rational, thread, ruby2_keywords are already required
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:563
Kernel#require ($LOADED_FEATURES) complex, enumerator, rational, thread, ruby2_keywords are already required ERROR
LeakError: Closed file descriptor: 8
Closed file descriptor: 9
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:5:in `<top (required)>'
6)
An exception occurred during: Kernel.require (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:317
Kernel.require (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded ERROR
LeakError: Leaked file descriptor: 8 : #<File:/__w/ruby/ruby/src/spec/ruby/fixtures/code/load_fixture.ext.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:23:in `<top (required)>'
7)
An exception occurred during: Kernel.require ($LOADED_FEATURES) stores an absolute path
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:330
Kernel.require ($LOADED_FEATURES) stores an absolute path ERROR
LeakError: Closed file descriptor: 8
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:23:in `<top (required)>'
8)
An exception occurred during: Kernel.require ($LOADED_FEATURES) does not load a non-canonical path for a file already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:535
Kernel.require ($LOADED_FEATURES) does not load a non-canonical path for a file already loaded ERROR
LeakError: Leaked file descriptor: 8 : #<File:/__w/ruby/ruby/src/spec/ruby/fixtures/code/load_fixture.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:23:in `<top (required)>'
9)
An exception occurred during: Kernel.require ($LOADED_FEATURES) does not load a ../ relative path for a file already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:551
Kernel.require ($LOADED_FEATURES) does not load a ../ relative path for a file already loaded ERROR
LeakError: Leaked file descriptor: 9 : #<File:../code/load_fixture.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:23:in `<top (required)>'
10)
An exception occurred during: Kernel.require ($LOADED_FEATURES) complex, enumerator, rational, thread, ruby2_keywords are already required
/__w/ruby/ruby/src/spec/ruby/core/kernel/shared/require.rb:563
Kernel.require ($LOADED_FEATURES) complex, enumerator, rational, thread, ruby2_keywords are already required ERROR
LeakError: Closed file descriptor: 8
Closed file descriptor: 9
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_spec.rb:23:in `<top (required)>'
11)
An exception occurred during: Kernel#require_relative with a relative path (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:197
Kernel#require_relative with a relative path (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded ERROR
LeakError: Leaked file descriptor: 8 : #<File:/__w/ruby/ruby/src/spec/ruby/fixtures/code/load_fixture.ext.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:4:in `<top (required)>'
12)
An exception occurred during: Kernel#require_relative with a relative path ($LOADED_FEATURES) stores an absolute path
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:205
Kernel#require_relative with a relative path ($LOADED_FEATURES) stores an absolute path ERROR
LeakError: Closed file descriptor: 8
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:4:in `<top (required)>'
13)
An exception occurred during: Kernel#require_relative with an absolute path (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:399
Kernel#require_relative with an absolute path (file extensions) does not load a C-extension file if a complex-extensioned .rb file is already loaded ERROR
LeakError: Leaked file descriptor: 8 : #<File:/__w/ruby/ruby/src/spec/ruby/fixtures/code/load_fixture.ext.rb>
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:277:in `<top (required)>'
14)
An exception occurred during: Kernel#require_relative with an absolute path ($LOAD_FEATURES) stores an absolute path
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:407
Kernel#require_relative with an absolute path ($LOAD_FEATURES) stores an absolute path ERROR
LeakError: Closed file descriptor: 8
/__w/ruby/ruby/src/spec/ruby/core/kernel/require_relative_spec.rb:277:in `<top (required)>'
```
When loading Ruby source files, we can save the result of
successful opens as open(2)/openat(2) are a fairly expensive
syscalls. This also avoids a time-of-check-to-time-of-use
(TOCTTOU) problem.
This reduces open(2) syscalls during `require'; but should be
most apparent when users have a small $LOAD_PATH. Users with
large $LOAD_PATH will benefit less since there'll be more
open(2) failures due to ENOENT.
With `strace -c -e openat ruby -e exit' under Linux, this
results in a ~14% reduction of openat(2) syscalls
(glibc uses openat(2) to implement open(2)).
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 296 110 openat
0.00 0.000000 0 254 110 openat
Additionally, the introduction of `struct ruby_file_load_state'
may make future optimizations more apparent.
This change cannot benefit binary (.so) loading since the
dlopen(3) API requires a filename and I'm not aware of an
alternative that takes a pre-existing FD. In typical
situations, Ruby source files outnumber the mount of .so
files.
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.
This commit adds these methods.
* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.
[Feature #19070]
Impacts on memory usage and performance are below:
Memory usage:
```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb
# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb
# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```
Performance:
```
$ cat ../ast_keep_tokens.yml
prelude: |
src = <<~SRC
module M
class C
def m1(a, b)
1 + a + b
end
end
end
SRC
benchmark:
without_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
with_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)
$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..
| |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens | 21.659k| 21.303k|
| | 1.02x| -|
|with_keep_tokens | 6.220k| 5.691k|
| | 1.09x| -|
```
... as per ko1's preference. He is preparing to extend this feature to
ISeq for his new debugger. He prefers "keep" to "save" for this wording.
This API is internal and not included in any released version, so I
change it in advance.
This option makes the parser keep the original source as an array of
the original code lines. This feature exploits the mechanism of
`SCRIPT_LINES__` but records only the specified code that is passed to
RubyVM::AST.of or .parse, instead of recording all parsed program texts.
According to MSVC manual (*1), cl.exe can skip including a header file
when that:
- contains #pragma once, or
- starts with #ifndef, or
- starts with #if ! defined.
GCC has a similar trick (*2), but it acts more stricter (e. g. there
must be _no tokens_ outside of #ifndef...#endif).
Sun C lacked #pragma once for a looong time. Oracle Developer Studio
12.5 finally implemented it, but we cannot assume such recent version.
This changeset modifies header files so that each of them include
strictly one #ifndef...#endif. I believe this is the most portable way
to trigger compiler optimizations. [Bug #16770]
*1: https://docs.microsoft.com/en-us/cpp/preprocessor/once
*2: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead. This would significantly
speed up incremental builds.
We take the following inclusion order in this changeset:
1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very
first thing among everything).
2. RUBY_EXTCONF_H if any.
3. Standard C headers, sorted alphabetically.
4. Other system headers, maybe guarded by #ifdef
5. Everything else, sorted alphabetically.
Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
These headers need no rewrite. Just add some minor tweaks, like
addition of #include lines. Mainly cosmetic.
TIMET_MAX_PLUS_ONE was deleted because the macro was used from only
one place (directly write expression there).
One day, I could not resist the way it was written. I finally started
to make the code clean. This changeset is the beginning of a series of
housekeeping commits. It is a simple refactoring; split internal.h into
files, so that we can divide and concur in the upcoming commits. No
lines of codes are either added or removed, except the obvious file
headers/footers. The generated binary is identical to the one before.