To reduce the number of locking for encoding manipulation,
enc_table::list is splited to ::default_list and ::additional_list.
::default_list is pre-allocated and no need locking to access to
the ::default_list. If additional encoding space is needed, use
::additional_list and this list need to use locking.
However, most of case, ::default_list is enough.
According to the log of ac803ab55d, I
found that a thread terminates silently due to "recycled object" of
id2ref:
```
"/home/chkbuild/chkbuild/tmp/build/20201017T033002Z/ruby/lib/drb/drb.rb:366:in `_id2ref'"
"/home/chkbuild/chkbuild/tmp/build/20201017T033002Z/ruby/lib/drb/drb.rb:366:in `to_obj'"
"/home/chkbuild/chkbuild/tmp/build/20201017T033002Z/ruby/lib/drb/drb.rb:1528:in `to_obj'"
"/home/chkbuild/chkbuild/tmp/build/20201017T033002Z/ruby/lib/drb/drb.rb:1847:in `to_obj'"
"/home/chkbuild/chkbuild/tmp/build/20201017T033002Z/ruby/lib/drb/drb.rb:1136:in `method_missing'"
"/home/chkbuild/chkbuild/tmp/build/20201017T033002Z/ruby/test/rinda/test_rinda.rb:652:in `block in do_reply'"
```
https://rubyci.org/logs/rubyci.s3.amazonaws.com/rhel8/ruby-master/log/20201017T033002Z.log.html.gz
I believe that this unintentional thread termination has caused
intermittent timeout failure of `TestRingServer#test_do_reply`.
The root cause of the "recycled object" issue is a bug of
`TestRingServer#test_do_reply`. It creates a callback Proc object but
does not hold the reference to the object:
```
callback = DRb::DRbObject.new callback
```
The original "callback" object is GC'ed unintentionally.
I could consistently reproduce this issue on my machine by adding
`GC.stress = true` at the first of `_test_do_reply` method body.
This change uses another local variable name, "callback_orig", to keep
the original Proc object.
iv_index_tbl manages instance variable indexes (ID -> index).
This data structure should be synchronized with other ractors
so introduce some VM locks.
This patch also introduced atomic ivar cache used by
set/getinlinecache instructions. To make updating ivar cache (IVC),
we changed iv_index_tbl data structure to manage (ID -> entry)
and an entry points serial and index. IVC points to this entry so
that cache update becomes atomically.
Buggy native extensions could have mark functions that cause stack
overflow. When a stack overflow happens during GC, Ruby used to recover
by raising an exception, which runs the interpreter. It's not safe to
run the interpreter during GC since the GC is in an inconsistent state.
This could cause object allocation during GC, for example.
Instead of running the interpreter and potentially causing a crash down
the line, fail fast and abort.
Update the lldb script so it can mostly recover a Ruby stack trace from
a core file. It's still missing line numbers and dealing with CFUNCs,
but you use it like this:
```
(lldb) rbbt ec
rb_control_frame_t TYPE
0x7f6fd6555fa0 EVAL ./bootstraptest/runner.rb error!!
0x7f6fd6555f68 METHOD ./bootstraptest/runner.rb main
0x7f6fd6555f30 METHOD ./bootstraptest/runner.rb in_temporary_working_directory
0x7f6fd6555ef8 METHOD /home/aaron/git/ruby/lib/tmpdir.rb mktmpdir
0x7f6fd6555ec0 BLOCK ./bootstraptest/runner.rb block in in_temporary_working_directory
0x7f6fd6555e88 CFUNC
0x7f6fd6555e50 BLOCK ./bootstraptest/runner.rb block (2 levels) in in_temporary_working_directory
0x7f6fd6555e18 BLOCK ./bootstraptest/runner.rb block in main
0x7f6fd6555de0 METHOD ./bootstraptest/runner.rb exec_test
0x7f6fd6555da8 CFUNC
0x7f6fd6555d70 BLOCK ./bootstraptest/runner.rb block in exec_test
0x7f6fd6555d38 CFUNC
0x7f6fd6555d00 TOP /home/aaron/git/ruby/bootstraptest/test_insns.rb error!!
0x7f6fd6555cc8 CFUNC
0x7f6fd6555c90 BLOCK /home/aaron/git/ruby/bootstraptest/test_insns.rb block in <top (required)>
0x7f6fd6555c58 METHOD ./bootstraptest/runner.rb assert_equal
0x7f6fd6555c20 METHOD ./bootstraptest/runner.rb assert_check
0x7f6fd6555be8 METHOD ./bootstraptest/runner.rb show_progress
0x7f6fd6555bb0 METHOD ./bootstraptest/runner.rb with_stderr
0x7f6fd6555b78 BLOCK ./bootstraptest/runner.rb block in show_progress
0x7f6fd6555b40 BLOCK ./bootstraptest/runner.rb block in assert_check
0x7f6fd6555b08 METHOD ./bootstraptest/runner.rb get_result_string
0x7f6fd6555ad0 METHOD ./bootstraptest/runner.rb make_srcfile
0x7f6fd6555a98 CFUNC
0x7f6fd6555a60 BLOCK ./bootstraptest/runner.rb block in make_srcfile
```
Getting the main execution context is difficult (it is stored in a
thread local) so for now you must supply an ec and this will make a
backtrace
generic_ivtbl is a process global table to maintain instance variables
for non T_OBJECT/T_CLASS/... objects. So we need to protect them
for multi-Ractor exection.
Hint: we can make them Ractor local for unshareable objects, but
now it is premature optimization.
(1) recorded_lock_rec > current_lock_rec should not be occurred
on rb_ec_vm_lock_rec_release().
(2) should be release VM lock at EXEC_TAG(), not POP_TAG().
(3) some refactoring.
enc_table which manages Encoding information. rb_encoding_list
also manages Encoding objects. Both are accessed/modified by ractors
simultaneously so that they should be synchronized.
For enc_table, this patch introduced GLOBAL_ENC_TABLE_ENTER/LEAVE/EVAL
to access this table with VM lock. To make shortcut, three new global
variables global_enc_ascii, global_enc_utf_8, global_enc_us_ascii are
also introduced.
For rb_encoding_list, we split it to rb_default_encoding_list (256 entries)
and rb_additional_encoding_list. rb_default_encoding_list is fixed sized Array
so we don't need to synchronized (and most of apps only needs it). To manage
257 or more encoding objects, they are stored into rb_additional_encoding_list.
To access rb_additional_encoding_list., VM lock is needed.
If a ractor getting a VM lock (monitor) raises an exception,
unlock can be skipped. To release VM lock correctly on exception
(or other jumps with JUMP_TAG), EC_POP_TAG() releases VM lock.
Using Fiber#transfer with Fiber#resume for a same Fiber is
limited (once Fiber#transfer is called for a fiber, the fiber
can not be resumed more). This restriction was introduced to
protect the resume/yield chain, but we realized that it is too much
to protect the chain. Instead of the current restriction, we
introduce some other protections.
(1) can not transfer to the resuming fiber.
(2) can not transfer to the yielding fiber.
(3) can not resume transferred fiber.
(4) can not yield from not-resumed fiber.
[Bug #17221]
Also at the end of a transferred fiber, it had continued on root fiber.
However, if the root fiber resumed a fiber (and that fiber can resumed
another fiber), this behavior also breaks the resume/yield chain.
So at the end of a transferred fiber, switch to the edge of resume
chain from root fiber.
For example, root fiber resumed f1 and f1 resumed f2, transferred to
f3 and f3 terminated, then continue from the fiber f2 (it was continued
from root fiber without this patch).
It uses random to determine if the bignum is sparse or not.
It is arguable if three-digit samples are enough or not to determine it,
but anyway, consuming Random source implicitly is not good.
I introduced the random sampling mechanism, and I don't know any
significant reason to do so. So, let's remove it.
This change makes the sampling points fixed: 40th, 50th, and 60th
percentiles.