This change implements a fallback mode for the `--yjit-dump-disasm`
development command-line option to make it usable in release builds.
Previously, using the option with release builds of YJIT yielded only
a warning asking the user to build with `--enable-yjit=dev`.
While builds that use the `disasm` feature still give the best output,
just having the comments is useful enough for many kinds of debugging.
Having it usable in release builds is nice for new hackers, too, since
this allows for tinkering without having to learn how to build YJIT in
development mode.
Sample output on A64:
```
# regenerate_branch
# Insn: 0001 opt_send_without_block (stack_size: 1)
# guard known object with singleton class
0x11f7e0034: 4b 00 00 58 03 00 00 14 08 ce 9c 04 01 00 00
0x11f7e0043: 00 3f 00 0b eb 81 06 01 54 1f 20 03 d5
# RUBY_VM_CHECK_INTS(ec)
0x11f7e0050: 8b 02 42 b8 cb 07 01 35
# stack overflow check
0x11f7e0058: ab 62 02 91 7f 02 0b eb 69 07 01 54
# save PC to CFP
0x11f7e0064: 0b 3b 9a d2 2b 2f a0 f2 0b 00 cc f2 6b 02 00
0x11f7e0073: f8 ab 82 00 91
```
To ensure this feature doesn't incur too much cost when running without
the `--yjit-dump-disasm` option, I checked that there is no significant
impact to compile time and memory usage with the `compile_time_ns` and
`yjit_alloc_size` entry in `RubyVM::YJIT.runtime_stats`. For each
sample, I ran 3 iterations of the `lobsters` YJIT benchmark. The
statistics summary and done with the `summary` function in R.
Compile time, sample size of 60, lower is better:
```
Before After
Min. :2.054e+09 Min. :2.028e+09
1st Qu.:2.069e+09 1st Qu.:2.044e+09
Median :2.081e+09 Median :2.060e+09
Mean :2.089e+09 Mean :2.066e+09
3rd Qu.:2.109e+09 3rd Qu.:2.085e+09
Max. :2.146e+09 Max. :2.144e+09
```
Allocation size, sample size of 20, lower is better:
```
Before After
Min. :21804742 Min. :21794082
1st Qu.:21826682 1st Qu.:21816282
Median :21844042 Median :21826814
Mean :21960664 Mean :22026291
3rd Qu.:21861228 3rd Qu.:22040439
Max. :22587426 Max. :22930614
```
The `yjit_alloc_size` samples are noisy, but since the average increased
by only 0.3%, and the median is lower, I feel safe saying that there is
no significant change.
We also need to protect prior removal of the binstub, otherwise it can
happen that:
* Process A removes prior binstub FOO.
* Process B removes prior binstub FOO (does nothing actually because Process A already removed it).
* Process A writes binstub FOO for gem BAR from the beginning of file.
* Process B writes binstub FOO for gem BAZ from the beginning of file.
Similarly as before, if binstub FOO for gem BAR is bigger that binstub
FOO for gem BAZ, garbage bytes will be left around at the end of the
file, corrupting the binstub.
The solution is to also protect removal of the previous binstub. To do
this, we use a file lock on an explicit `.lock` file.
https://github.com/rubygems/rubygems/commit/d99a80e62d
There's an issue when multiple processes try to write the same binstub.
The problem is that our file locking mechanism is incorrect because
files are truncated _before_ they are locked. So it can happen that:
* Process A truncates binstub FOO.
* Process B truncates binstub FOO.
* Process A writes binstub FOO for gem BAR from the beginning of file.
* Process B writes binstub FOO for gem BAZ from the beginning of file.
If binstub FOO for gem BAR is bigger than binstub FOO for gem BAZ, then
some bytes will be left around at the end of the binstub, making it
corrupt.
This was not a problem in our specs until the spec testing binstubs with
the same name coming from different gems changed from using gems named
"fake" and "rack" to using gems named "fake" and "myrack". Because of
the difference in gem name length, the generated binstub for gem
"myrack" is now longer, causing the above problem if binstub for gem
myrack is written first.
The solution is to make sure when using flock to always use modes that
DON'T truncate the file when opening it. So, we use `r+` if the file
exists previously (it requires the file to exist previously), otherwise
we use `a+`.
https://github.com/rubygems/rubygems/commit/ce8bcba90f
We do not implement CET shadow-stack switching in amd64 Context.S. If
you compile Ruby with `-fcf-protection=full` and run it with
`GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK` exported, it will crash with a
control flow exception.
Configure the appropriate notes at the end of Context.S
[Bug #18061]
The dtrace python script from systemtap on Linux actually looks at the
CFLAGS environment variable when invoking gcc to make the probes.o file.
If we don't pass the CFLAGS we're using, this probes.o file can wind up
without the required annotations indicating that it supports e.g. Intel
CET.
Fix this by explicitly exporting our build flags to the environment for
this script.
[Bug #18061]
This partially reverts https://github.com/ruby/ruby/pull/10944; now that
we decided to pass CFLAGS to $(CC) when assembling .S files, we don't
need these autoconf macros that capture the state of
__ARM_FEATURE{PAC|BTI}_DEFAULT.
[Bug #20601]
We already assemble our assembly files using the $(CC) compiler driver,
rather than the actual $(AS) assembler. This means that
* The C preprocessor gets run on the assembly file
* It's valid to pass gcc-style flags to it, like e.g.
-mbranch-protection or -fcf-protection
* If you do so, the relevant preprocessor macros like __CET__ get set
* If you really wanted to pass assembler flags, you would need to do
that using -Wa,... anyway
So I think it makes sense to pass "$(XCFLAGS) $(CFLAGS) $(CPPFLAGS)" to
gcc/clang/etc when assembling, rather than passing $(ASFLAGS) (since
the flags are not actually passed to `as`, but `cc`!).
The side effect of this is that if there are mitigation flags like
-fcf-protection in $CFLAGS, then the relevant macros like __CET__ will
be defined when assembling the files.
[Bug #20601]
Previously, a TypeError was not raised if there were no thread
variables, because the conversion to symbol was done after that
check. Convert to symbol before checking for whether thread
variables are set to make the behavior consistent.
Fixes [Bug #20606]
This commit changes the external GC API to use `--with-shared-gc=DIR` at
configure time with a directory of the external GC and uses
`RUBY_GC_LIBRARY` environment variable to load the external GC at
runtime.