After 5e86b005c0, I now think ANYARGS is
dangerous and should be extinct. This commit uses rb_gvar_getter_t /
rb_gvar_setter_t for rb_define_hooked_variable /
rb_define_virtual_variable which revealed lots of function prototype
inconsistencies. Some of them were literally decades old, going back
to dda5dc00cf.
Also, remove documentation about returning self, which makes no
sense as self would be the Regexp class. It could be interpreted
as return the argument if no changes were made, but that hasn't
been the behavior at least since 1.8.7 (and probably before).
Fixes [Bug #10239]
It seems that decades ago, ruby was written under assumption that
char is unsigned. Which is of course a false assumption. We
need to explicitly store a numeric value into an unsigned char
variable to tell we expect 0..255 value.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65900 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
read_escaped_byte() returns values of range -1...256. -1 indicates
error. So the function basically expects char to be 0..255 range.
There is no such guarantee. `char` is not always unsigned. We
need to explicitly declare chbuf to be unsigned char.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65677 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (rb_reg_str_with_term): change terminator.
* re.c (rb_reg_s_union): terminator in source string does not need
to be escaped. terminators are outside of regexp source itself.
[ruby-core:86149] [Bug #14608]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62779 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (unescape_nonascii): escaped multibyte character should be
copied as-is, just with checking if the encoding matches.
https://twitter.com/sakuro/status/972014409986883584
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62718 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The "n" option for regexp, /.../n, is historical.
It doesn't mean the regexp works as binary match since Ruby 1.9.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57759 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Follow r49675, r57098 and r57110. Don't assume RMatch::regexp always
contains a valid Regexp instance; it will be Qnil if the MatchData is
created by rb_backref_set_string(). [ruby-core:78741] [Bug #13054]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57123 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (match_backref_number, namev_to_backref_number): use
RB_TYPE_P instead of switching by TYPE.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57114 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (namev_to_backref_number, rb_reg_regsub): extract name to
backref number check as NAME_TO_NUMBER.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57112 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (match_backref_number): use name_to_backref_number for
casts.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57110 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (rb_reg_regsub): other than regexp has no name references.
[ruby-core:78686] [Bug #13042]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57098 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_str_match_m_p): inverse of Regexp#match?. based on
the patch by Herwin Weststrate <herwin@snt.utwente.nl>.
[Fix GH-1483] [Feature #12898]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57053 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (rb_reg_match_m_p): consider char boundary. rb_str_subpos
does not adjust to the boundary if len == 0.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57051 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Follow r16757 ("* re.c: fix SEGV by Regexp.allocate.names,
Match.allocate.names, etc.", 2008-06-02). Don't do null dereference if
MatchData#hash or #== is called against an uninitialized instance.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56994 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Don't discard the hash value computed for the regexp object. It seems it
was simply missed out in r24754, when MatchData#hash was initially
implemented.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56962 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
a hash value of Object might be Bignum, but it causes many troubles
expecially the Object is used as a key of a hash. so I've gave up
to do so.
* array.c (rb_ary_hash): use above macro.
* bignum.c (rb_big_hash): ditto.
* hash.c (rb_obj_hash, rb_hash_hash): ditto.
* numeric.c (rb_dbl_hash): ditto.
* proc.c (proc_hash): ditto.
* re.c (rb_reg_hash, match_hash): ditto.
* string.c (rb_str_hash_m): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56340 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (rb_reg_match_m_p): [DOC] fix return value in rdoc.
* test/ruby/test_regexp.rb (TestRegexp#test_match_p): add some
tests from document.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55075 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (rb_reg_match_m_p): fix match against empty string.
rb_str_offset returns the end when the position exceeds the
length. fix the range parameter of onig_search.
[ruby-core:75604] [Bug #12394]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55069 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (rb_reg_match_m_p): should return nil if no match, as the
document says. [Feature #8110]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55067 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (rb_reg_match_m_p): fix type of variable for onig_search
result.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55062 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
temporary array.
* re.c (match_ary_aref): get element(s) of match array without creating
temporary array.
* re.c (match_aref): Use match_ary_subseq with handling irregulars.
* re.c (match_values_at): Use match_ary_aref.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55053 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
has coderange information.
* re.c (rb_reg_prepare_enc): add shortcut path when the regexp has
the same encoding of given string.
* re.c (rb_reg_prepare_re): avoid duplicated allocation of
onig_errmsg_buffer.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54886 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (rb_reg_initialize): must copy the source string content,
it is not a static literal.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53738 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_fstring_enc_new, rb_fstring_enc_cstr): functions to
make fstring with encoding.
* re.c (rb_reg_initialize): make fstring without copying.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53736 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
if given string is ASCII only.
121.2s to 113.9s on my x86_64-freebsd10.2 Intel Core i5 661
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53720 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* variable.c (rb_f_global_variables): add matched back references
only, as well as defiend? operator.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53534 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* encindex.h: separate encoding index constants from internal.h.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51861 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (rb_memsearch_wchar, rb_memsearch_qchar): test matching
till the end of string. [ruby-core:70592] [Bug #11488]
* test/ruby/test_m17n.rb (test_include?, tet_index): add tests by
Tom Stuart.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51685 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (rb_memsearch): should match only char boundaries in wide
character encodings. [ruby-core:70220] [Bug #11413]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51470 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* include/ruby/encoding.h (ENC_CODERANGE_CLEAN_P): predicate that
tells if the coderange is clean, that is 7bit or valid, and no
needs to scrub.
* re.c (rb_reg_expr_str): use ENC_CODERANGE_CLEAN_P.
* string.c (enc_strlen, rb_enc_cr_str_buf_cat, rb_str_scrub):
ditto.
* string.c (rb_str_enumerate_chars): ditto, and suppress a warning
by gcc6.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51278 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* range.c (range_step, range_each): String#upto should never
modifies the receiver, use frozen strings to enumerate symbols.
* re.c (reg_operand): matching target is not modified.
* ext/socket/constants.c (constant_arg): str_to_int never modifies
argument strings.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@50306 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (match_aref): RMatch::regexp is Qnil after matching by a
string since r45451. [ruby-core:68209] [Bug #10877]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49675 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (rb_reg_region_copy): new function to try with GC if copy
failed and return the error.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48672 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (CHECK_REGION_COPIED): onig_region_copy() can fail when
memory exhausted but returns nothing, so check by if allocated.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48669 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (unescape_nonascii): make dynamically compiled US-ASCII
regexps ASCII-8BIT encoding if binary (hexadecimal, control,
meta) escapes are contained, as well as literal regexps.
[ruby-dev:48626] [Bug #10382]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47992 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* process.c (rlimit_resource_type, rlimit_resource_value): get rid
of function calls in RSTRING_PTR(), as it evaluates the argument
twice.
* re.c (match_backref_number): ditto.
* signal.c (esignal_init, rb_f_kill, trap_signm): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47006 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* re.c (match_aref): should not ignore name after NUL byte.
[ruby-dev:48275] [Bug #9902]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46344 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* string.c (rb_pat_search): match result should be infected by the
pattern.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@45460 b2dd03c8-39d4-4d8f-98ff-823fe69b080e