github/ruby - ruby

Граф коммитов

Автор	SHA1	Сообщение	Дата
Jean Boussier	e440268d51	Get rid of JSON benchmarks	2024-11-05 12:19:55 +01:00
Hiroshi SHIBATA	ff56064469	Fixup `b1fc1af444`. Removed benchmark files from ruby/json	2024-11-05 11:02:13 +09:00
Jean Boussier	cc2e67a138	Elide Generator::State allocation until a `to_json` method has to be called Fix: https://github.com/ruby/json/issues/655 For very small documents, the biggest performance gap with alternatives is that the API impose that we allocate the `State` object. In a real world app this doesn't make much of a difference, but when running in a micro-benchmark this doubles the allocations, causing twice the amount of GC runs, making us look bad. However, unless we have to call a `to_json` method, the `State` object isn't visible, so with some refactoring, we can elude that allocation entirely. Instead we allocate the State internal struct on the stack, and if we need to call a `to_json` method, we allocate the `State` and spill the struct on the heap. As a result, `JSON.generate` is now as fast as re-using a `State` instance, as long as only primitives are generated. Before: ``` == Encoding small mixed (34 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json (reuse) 598.654k i/100ms json 400.542k i/100ms oj 533.353k i/100ms Calculating ------------------------------------- json (reuse) 6.371M (± 8.6%) i/s (156.96 ns/i) - 31.729M in 5.059195s json 4.120M (± 6.6%) i/s (242.72 ns/i) - 20.828M in 5.090549s oj 5.622M (± 6.4%) i/s (177.86 ns/i) - 28.268M in 5.061473s Comparison: json (reuse): 6371126.6 i/s oj: 5622452.0 i/s - same-ish: difference falls within error json: 4119991.1 i/s - 1.55x slower == Encoding small nested array (121 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json (reuse) 248.125k i/100ms json 215.255k i/100ms oj 217.531k i/100ms Calculating ------------------------------------- json (reuse) 2.628M (± 6.1%) i/s (380.55 ns/i) - 13.151M in 5.030281s json 2.185M (± 6.7%) i/s (457.74 ns/i) - 10.978M in 5.057655s oj 2.217M (± 6.7%) i/s (451.10 ns/i) - 11.094M in 5.044844s Comparison: json (reuse): 2627799.4 i/s oj: 2216824.8 i/s - 1.19x slower json: 2184669.5 i/s - 1.20x slower == Encoding small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json (reuse) 641.334k i/100ms json 322.745k i/100ms oj 642.450k i/100ms Calculating ------------------------------------- json (reuse) 7.133M (± 6.5%) i/s (140.19 ns/i) - 35.915M in 5.068201s json 4.615M (± 7.0%) i/s (216.70 ns/i) - 22.915M in 5.003718s oj 6.912M (± 6.4%) i/s (144.68 ns/i) - 34.692M in 5.047690s Comparison: json (reuse): 7133123.3 i/s oj: 6911977.1 i/s - same-ish: difference falls within error json: 4614696.6 i/s - 1.55x slower ``` After: ``` == Encoding small mixed (34 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json (reuse) 572.751k i/100ms json 457.741k i/100ms oj 512.247k i/100ms Calculating ------------------------------------- json (reuse) 6.324M (± 6.9%) i/s (158.12 ns/i) - 31.501M in 5.023093s json 6.263M (± 6.9%) i/s (159.66 ns/i) - 31.126M in 5.017086s oj 5.569M (± 6.6%) i/s (179.56 ns/i) - 27.661M in 5.003739s Comparison: json (reuse): 6324183.5 i/s json: 6263204.9 i/s - same-ish: difference falls within error oj: 5569049.2 i/s - same-ish: difference falls within error == Encoding small nested array (121 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json (reuse) 258.505k i/100ms json 242.335k i/100ms oj 220.678k i/100ms Calculating ------------------------------------- json (reuse) 2.589M (± 9.6%) i/s (386.17 ns/i) - 12.925M in 5.071853s json 2.594M (± 6.6%) i/s (385.46 ns/i) - 13.086M in 5.083035s oj 2.250M (± 2.3%) i/s (444.43 ns/i) - 11.255M in 5.004707s Comparison: json (reuse): 2589499.6 i/s json: 2594321.0 i/s - same-ish: difference falls within error oj: 2250064.0 i/s - 1.15x slower == Encoding small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json (reuse) 656.373k i/100ms json 644.135k i/100ms oj 650.283k i/100ms Calculating ------------------------------------- json (reuse) 7.202M (± 7.1%) i/s (138.84 ns/i) - 36.101M in 5.051438s json 7.278M (± 1.7%) i/s (137.40 ns/i) - 36.716M in 5.046300s oj 7.036M (± 1.7%) i/s (142.12 ns/i) - 35.766M in 5.084729s Comparison: json (reuse): 7202447.9 i/s json: 7277883.0 i/s - same-ish: difference falls within error oj: 7036115.2 i/s - same-ish: difference falls within error ```	2024-11-01 13:04:24 +09:00
Jean Boussier	b042d9d9c1	[ruby/json] Use JSON.generate instead of JSON.dump for benchmarking https://github.com/ruby/json/commit/97b61edce1	2024-11-01 13:04:24 +09:00
Jean Boussier	2e43621806	[ruby/json] Optimize `fbuffer_append_long` Ref: https://github.com/ruby/json/issues/655 Rather than to write the number backward, and then reverse the buffer, we can start from the back of the buffer and write the number in the proper direction. Before: ``` == Encoding integers (8009 bytes) ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 8.606k i/100ms oj 9.598k i/100ms Calculating ------------------------------------- json 86.059k (± 0.8%) i/s (11.62 μs/i) - 430.300k in 5.000416s oj 97.409k (± 0.6%) i/s (10.27 μs/i) - 489.498k in 5.025360s Comparison: json: 86058.8 i/s oj: 97408.8 i/s - 1.13x faster ``` After: ``` == Encoding integers (8009 bytes) ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json (reuse) 9.500k i/100ms json 9.359k i/100ms oj 9.722k i/100ms Calculating ------------------------------------- json (reuse) 96.270k (± 0.4%) i/s (10.39 μs/i) - 484.500k in 5.032777s json 94.800k (± 2.2%) i/s (10.55 μs/i) - 477.309k in 5.037495s oj 97.131k (± 0.7%) i/s (10.30 μs/i) - 486.100k in 5.004822s Comparison: json (reuse): 96270.1 i/s oj: 97130.5 i/s - same-ish: difference falls within error json: 94799.9 i/s - same-ish: difference falls within error ``` https://github.com/ruby/json/commit/0655b58d14	2024-10-29 13:25:01 +09:00
Jean Boussier	00aa1f9a1d	[ruby/json] Encoding benchmark updates Remove `rapidjson` as it's 2x slower most benchmarks, and on par on a couple of them, so it's not telling us much here. Configure `Oj` in compat mode so it generate the same JSON on the `many to_json` benchmark. ``` == Encoding small nested array (121 bytes) ruby 3.4.0preview2 (2024-10-07 master https://github.com/ruby/json/commit/32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json (reuse) 220.202k i/100ms json 162.190k i/100ms oj 222.094k i/100ms Calculating ------------------------------------- json (reuse) 2.322M (± 1.3%) i/s (430.72 ns/i) - 11.671M in 5.027655s json 1.707M (± 1.2%) i/s (585.76 ns/i) - 8.596M in 5.035996s oj 2.248M (± 1.4%) i/s (444.94 ns/i) - 11.327M in 5.040712s Comparison: json (reuse): 2321686.9 i/s oj: 2247509.6 i/s - 1.03x slower json: 1707179.3 i/s - 1.36x slower == Encoding small hash (65 bytes) ruby 3.4.0preview2 (2024-10-07 master https://github.com/ruby/json/commit/32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json (reuse) 446.184k i/100ms json 265.594k i/100ms oj 653.226k i/100ms Calculating ------------------------------------- json (reuse) 4.980M (± 1.4%) i/s (200.82 ns/i) - 24.986M in 5.018729s json 2.763M (± 1.8%) i/s (361.94 ns/i) - 13.811M in 5.000434s oj 7.232M (± 1.4%) i/s (138.28 ns/i) - 36.581M in 5.059377s Comparison: json (reuse): 4979642.4 i/s oj: 7231624.4 i/s - 1.45x faster json: 2762890.1 i/s - 1.80x slower == Encoding mixed utf8 (5003001 bytes) ruby 3.4.0preview2 (2024-10-07 master https://github.com/ruby/json/commit/32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 34.000 i/100ms oj 36.000 i/100ms Calculating ------------------------------------- json 357.772 (± 4.8%) i/s (2.80 ms/i) - 1.802k in 5.047308s oj 327.521 (± 1.5%) i/s (3.05 ms/i) - 1.656k in 5.057241s Comparison: json: 357.8 i/s oj: 327.5 i/s - 1.09x slower == Encoding mostly utf8 (5001001 bytes) ruby 3.4.0preview2 (2024-10-07 master https://github.com/ruby/json/commit/32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 26.000 i/100ms oj 36.000 i/100ms Calculating ------------------------------------- json 294.357 (±10.5%) i/s (3.40 ms/i) - 1.456k in 5.028862s oj 352.826 (± 8.2%) i/s (2.83 ms/i) - 1.764k in 5.045651s Comparison: json: 294.4 i/s oj: 352.8 i/s - same-ish: difference falls within error == Encoding twitter.json (466906 bytes) ruby 3.4.0preview2 (2024-10-07 master https://github.com/ruby/json/commit/32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 206.000 i/100ms oj 229.000 i/100ms Calculating ------------------------------------- json 2.064k (± 9.3%) i/s (484.55 μs/i) - 10.300k in 5.056409s oj 2.121k (± 8.4%) i/s (471.47 μs/i) - 10.534k in 5.012315s Comparison: json: 2063.8 i/s oj: 2121.0 i/s - same-ish: difference falls within error == Encoding citm_catalog.json (500298 bytes) ruby 3.4.0preview2 (2024-10-07 master https://github.com/ruby/json/commit/32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 119.000 i/100ms oj 126.000 i/100ms Calculating ------------------------------------- json 1.317k (± 2.3%) i/s (759.18 μs/i) - 6.664k in 5.061781s oj 1.261k (± 2.9%) i/s (793.11 μs/i) - 6.300k in 5.000714s Comparison: json: 1317.2 i/s oj: 1260.9 i/s - same-ish: difference falls within error == Encoding canada.json (2090234 bytes) ruby 3.4.0preview2 (2024-10-07 master https://github.com/ruby/json/commit/32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 1.000 i/100ms oj 1.000 i/100ms Calculating ------------------------------------- json 19.590 (± 0.0%) i/s (51.05 ms/i) - 98.000 in 5.004485s oj 19.003 (± 0.0%) i/s (52.62 ms/i) - 95.000 in 5.002276s Comparison: json: 19.6 i/s oj: 19.0 i/s - 1.03x slower == Encoding many #to_json calls (2701 bytes) ruby 3.4.0preview2 (2024-10-07 master https://github.com/ruby/json/commit/32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 2.556k i/100ms oj 2.332k i/100ms Calculating ------------------------------------- json 25.367k (± 1.7%) i/s (39.42 μs/i) - 127.800k in 5.039438s oj 23.743k (± 1.5%) i/s (42.12 μs/i) - 118.932k in 5.010303s Comparison: json: 25367.3 i/s oj: 23743.3 i/s - 1.07x slower ``` https://github.com/ruby/json/commit/5a64fd5b6f	2024-10-26 18:44:15 +09:00
Jean Boussier	e52b47680e	[ruby/json] Reduce encoding benchmark size Profiling revealed that we were spending lots of time growing the buffer. Buffer operations is definitely something we want to optimize, but for this specific benchmark what we're interested in is UTF-8 scanning performance. Each iteration of the two scaning benchmark were producing 20MB of JSON, now they only produce 5MB. Now: ``` == Encoding mostly utf8 (5001001 bytes) ruby 3.4.0dev (2024-10-18T19:01:45Z master https://github.com/ruby/json/commit/7be9a333ca) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 35.000 i/100ms oj 36.000 i/100ms rapidjson 10.000 i/100ms Calculating ------------------------------------- json 359.161 (± 1.4%) i/s (2.78 ms/i) - 1.820k in 5.068542s oj 359.699 (± 0.6%) i/s (2.78 ms/i) - 1.800k in 5.004291s rapidjson 99.687 (± 2.0%) i/s (10.03 ms/i) - 500.000 in 5.017321s Comparison: json: 359.2 i/s oj: 359.7 i/s - same-ish: difference falls within error rapidjson: 99.7 i/s - 3.60x slower ``` https://github.com/ruby/json/commit/1a338532d2	2024-10-26 18:44:15 +09:00
Jean Boussier	97713ac952	[ruby/json] convert_UTF8_to_JSON: repurpose the escape tables into size tables Since we're looking up the table anyway, we might as well store the UTF-8 char length in it. For single byte characters that don't need escaping we store `0`. This helps on strings with lots of multi-byte characters: Before: ``` == Encoding mostly utf8 (20004001 bytes) ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 6.000 i/100ms oj 10.000 i/100ms rapidjson 2.000 i/100ms Calculating ------------------------------------- json 67.978 (± 1.5%) i/s (14.71 ms/i) - 342.000 in 5.033062s oj 100.876 (± 2.0%) i/s (9.91 ms/i) - 510.000 in 5.058080s rapidjson 26.389 (± 7.6%) i/s (37.89 ms/i) - 132.000 in 5.027681s Comparison: json: 68.0 i/s oj: 100.9 i/s - 1.48x faster rapidjson: 26.4 i/s - 2.58x slower ``` After: ``` == Encoding mostly utf8 (20004001 bytes) ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 7.000 i/100ms oj 10.000 i/100ms rapidjson 2.000 i/100ms Calculating ------------------------------------- json 75.187 (± 2.7%) i/s (13.30 ms/i) - 378.000 in 5.030111s oj 95.196 (± 2.1%) i/s (10.50 ms/i) - 480.000 in 5.043565s rapidjson 25.969 (± 3.9%) i/s (38.51 ms/i) - 130.000 in 5.011471s Comparison: json: 75.2 i/s oj: 95.2 i/s - 1.27x faster rapidjson: 26.0 i/s - 2.90x slower ``` https://github.com/ruby/json/commit/51e2631d1f	2024-10-26 18:44:15 +09:00
Jean Boussier	9f300d0541	[ruby/json] Optimize convert_UTF8_to_JSON for mostly ASCII strings If we assume that even UTF-8 strings are mostly ASCII, we can implement a fast path for the ASCII parts. Before: ``` == Encoding mixed utf8 (20012001 bytes) ruby 3.4.0dev (2024-10-18T15:12:54Z master https://github.com/ruby/json/commit/d1b5c10957) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 5.000 i/100ms oj 9.000 i/100ms rapidjson 2.000 i/100ms Calculating ------------------------------------- json 49.403 (± 2.0%) i/s (20.24 ms/i) - 250.000 in 5.062647s oj 100.120 (± 2.0%) i/s (9.99 ms/i) - 504.000 in 5.035349s rapidjson 26.404 (± 0.0%) i/s (37.87 ms/i) - 132.000 in 5.001025s Comparison: json: 49.4 i/s oj: 100.1 i/s - 2.03x faster rapidjson: 26.4 i/s - 1.87x slower ``` After: ``` == Encoding mixed utf8 (20012001 bytes) ruby 3.4.0dev (2024-10-18T15:12:54Z master https://github.com/ruby/json/commit/d1b5c10957) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 10.000 i/100ms oj 9.000 i/100ms rapidjson 2.000 i/100ms Calculating ------------------------------------- json 95.686 (± 2.1%) i/s (10.45 ms/i) - 480.000 in 5.018575s oj 96.875 (± 2.1%) i/s (10.32 ms/i) - 486.000 in 5.019097s rapidjson 26.260 (± 3.8%) i/s (38.08 ms/i) - 132.000 in 5.033151s Comparison: json: 95.7 i/s oj: 96.9 i/s - same-ish: difference falls within error rapidjson: 26.3 i/s - 3.64x slower ``` https://github.com/ruby/json/commit/f8166c2d7f	2024-10-26 18:44:15 +09:00
Jean Boussier	aed0114913	[ruby/json] Annotate the encoding benchmark Note where we currently stand, what the current bottlencks are and what could or can't be done. ``` == Encoding small nested array (121 bytes) ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) [arm64-darwin23] Warming up -------------------------------------- json 129.145k i/100ms json (reuse) 239.395k i/100ms oj 211.514k i/100ms rapidjson 130.660k i/100ms Calculating ------------------------------------- json 1.284M (± 0.3%) i/s (779.11 ns/i) - 6.457M in 5.030954s json (reuse) 2.405M (± 0.1%) i/s (415.77 ns/i) - 12.209M in 5.076202s oj 2.118M (± 0.0%) i/s (472.11 ns/i) - 10.787M in 5.092795s rapidjson 1.325M (± 1.3%) i/s (754.82 ns/i) - 6.664M in 5.030763s Comparison: json: 1283514.8 i/s json (reuse): 2405175.0 i/s - 1.87x faster oj: 2118132.9 i/s - 1.65x faster rapidjson: 1324820.8 i/s - 1.03x faster == Encoding small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) [arm64-darwin23] Warming up -------------------------------------- json 177.502k i/100ms json (reuse) 485.963k i/100ms oj 656.566k i/100ms rapidjson 227.985k i/100ms Calculating ------------------------------------- json 1.774M (± 3.1%) i/s (563.67 ns/i) - 8.875M in 5.007964s json (reuse) 4.804M (± 3.0%) i/s (208.16 ns/i) - 24.298M in 5.062426s oj 6.564M (± 1.9%) i/s (152.36 ns/i) - 32.828M in 5.003539s rapidjson 2.229M (± 2.0%) i/s (448.59 ns/i) - 11.171M in 5.013299s Comparison: json: 1774084.6 i/s oj: 6563547.8 i/s - 3.70x faster json (reuse): 4804083.0 i/s - 2.71x faster rapidjson: 2229209.5 i/s - 1.26x faster == Encoding twitter.json (466906 bytes) ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) [arm64-darwin23] Warming up -------------------------------------- json 212.000 i/100ms oj 222.000 i/100ms rapidjson 109.000 i/100ms Calculating ------------------------------------- json 2.135k (± 0.7%) i/s (468.32 μs/i) - 10.812k in 5.063665s oj 2.219k (± 1.9%) i/s (450.69 μs/i) - 11.100k in 5.004642s rapidjson 1.093k (± 3.8%) i/s (914.66 μs/i) - 5.559k in 5.090812s Comparison: json: 2135.3 i/s oj: 2218.8 i/s - 1.04x faster rapidjson: 1093.3 i/s - 1.95x slower == Encoding citm_catalog.json (500298 bytes) ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) [arm64-darwin23] Warming up -------------------------------------- json 132.000 i/100ms oj 126.000 i/100ms rapidjson 96.000 i/100ms Calculating ------------------------------------- json 1.304k (± 2.2%) i/s (766.96 μs/i) - 6.600k in 5.064483s oj 1.272k (± 0.8%) i/s (786.14 μs/i) - 6.426k in 5.052044s rapidjson 997.370 (± 4.8%) i/s (1.00 ms/i) - 4.992k in 5.016266s Comparison: json: 1303.9 i/s oj: 1272.0 i/s - same-ish: difference falls within error rapidjson: 997.4 i/s - 1.31x slower == Encoding canada.json (2090234 bytes) ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) [arm64-darwin23] Warming up -------------------------------------- json 2.000 i/100ms oj 3.000 i/100ms rapidjson 1.000 i/100ms Calculating ------------------------------------- json 20.001 (± 0.0%) i/s (50.00 ms/i) - 102.000 in 5.100950s oj 30.823 (± 0.0%) i/s (32.44 ms/i) - 156.000 in 5.061333s rapidjson 19.446 (± 0.0%) i/s (51.42 ms/i) - 98.000 in 5.041884s Comparison: json: 20.0 i/s oj: 30.8 i/s - 1.54x faster rapidjson: 19.4 i/s - 1.03x slower == Encoding many #to_json calls (2661 bytes) oj does not match expected output. Skipping rapidjson unsupported (Invalid object key type: Object) ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/json/commit/be1089c8ec) [arm64-darwin23] Warming up -------------------------------------- json 2.200k i/100ms Calculating ------------------------------------- json 22.253k (± 0.2%) i/s (44.94 μs/i) - 112.200k in 5.041962s ``` https://github.com/ruby/json/commit/77e97b3d4e	2024-10-26 18:44:15 +09:00
ydah	199691553e	[ruby/json] Godounov ==> Godunov https://github.com/ruby/json/commit/dbf7e9f473	2024-10-16 07:11:03 +00:00
Jean Boussier	615a087216	[ruby/json] Restore the simple standlone benchmark for iterating https://github.com/ruby/json/commit/7b68800991	2024-10-09 12:35:54 +00:00
Jean Boussier	6e2619c968	Revamp the benchmark suite There is a large number of outstanding performance PRs that I want to merge, but we need a decent benchmark to judge if they are effective. I went to borrow rapidjson's benchmark suite, which is a good start. I only kept the comparison with Oj and RapidJSON, because YAJL is slower on most benchmarks, so little point comparing to it. Encoding: ``` == Encoding small nested array (121 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 88.225k i/100ms oj 209.862k i/100ms rapidjson 128.978k i/100ms Calculating ------------------------------------- json 914.611k (± 0.4%) i/s (1.09 μs/i) - 4.588M in 5.016099s oj 2.163M (± 0.2%) i/s (462.39 ns/i) - 10.913M in 5.045964s rapidjson 1.392M (± 1.3%) i/s (718.55 ns/i) - 6.965M in 5.005438s Comparison: json: 914610.6 i/s oj: 2162693.5 i/s - 2.36x faster rapidjson: 1391682.6 i/s - 1.52x faster == Encoding small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 142.093k i/100ms oj 651.412k i/100ms rapidjson 237.706k i/100ms Calculating ------------------------------------- json 1.478M (± 0.7%) i/s (676.78 ns/i) - 7.389M in 5.000866s oj 7.150M (± 0.7%) i/s (139.85 ns/i) - 35.828M in 5.010756s rapidjson 2.250M (± 1.6%) i/s (444.46 ns/i) - 11.410M in 5.072451s Comparison: json: 1477595.1 i/s oj: 7150472.0 i/s - 4.84x faster rapidjson: 2249926.7 i/s - 1.52x faster == Encoding twitter.json (466906 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 101.000 i/100ms oj 223.000 i/100ms rapidjson 105.000 i/100ms Calculating ------------------------------------- json 1.017k (± 0.7%) i/s (982.83 μs/i) - 5.151k in 5.062786s oj 2.244k (± 0.7%) i/s (445.72 μs/i) - 11.373k in 5.069428s rapidjson 1.069k (± 4.6%) i/s (935.20 μs/i) - 5.355k in 5.016652s Comparison: json: 1017.5 i/s oj: 2243.6 i/s - 2.21x faster rapidjson: 1069.3 i/s - same-ish: difference falls within error == Encoding citm_catalog.json (500299 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 77.000 i/100ms oj 129.000 i/100ms rapidjson 96.000 i/100ms Calculating ------------------------------------- json 767.217 (± 2.5%) i/s (1.30 ms/i) - 3.850k in 5.021957s oj 1.291k (± 1.5%) i/s (774.45 μs/i) - 6.579k in 5.096439s rapidjson 959.527 (± 1.1%) i/s (1.04 ms/i) - 4.800k in 5.003052s Comparison: json: 767.2 i/s oj: 1291.2 i/s - 1.68x faster rapidjson: 959.5 i/s - 1.25x faster == Encoding canada.json (2090234 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 1.000 i/100ms oj 3.000 i/100ms rapidjson 1.000 i/100ms Calculating ------------------------------------- json 19.748 (± 0.0%) i/s (50.64 ms/i) - 99.000 in 5.013336s oj 31.016 (± 0.0%) i/s (32.24 ms/i) - 156.000 in 5.029732s rapidjson 19.419 (± 0.0%) i/s (51.50 ms/i) - 98.000 in 5.050382s Comparison: json: 19.7 i/s oj: 31.0 i/s - 1.57x faster rapidjson: 19.4 i/s - 1.02x slower == Encoding many #to_json calls (2661 bytes) oj does not match expected output. Skipping rapidjson unsupported (Invalid object key type: Object) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 2.129k i/100ms Calculating ------------------------------------- json 21.599k (± 0.6%) i/s (46.30 μs/i) - 108.579k in 5.027198s ``` Parsing: ``` == Parsing small nested array (121 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 47.497k i/100ms oj 54.115k i/100ms oj strict 53.854k i/100ms Oj::Parser 150.904k i/100ms rapidjson 80.775k i/100ms Calculating ------------------------------------- json 481.096k (± 1.1%) i/s (2.08 μs/i) - 2.422M in 5.035657s oj 554.878k (± 0.6%) i/s (1.80 μs/i) - 2.814M in 5.071521s oj strict 547.888k (± 0.7%) i/s (1.83 μs/i) - 2.747M in 5.013212s Oj::Parser 1.545M (± 0.4%) i/s (647.16 ns/i) - 7.847M in 5.078302s rapidjson 822.422k (± 0.6%) i/s (1.22 μs/i) - 4.120M in 5.009178s Comparison: json: 481096.4 i/s Oj::Parser: 1545223.5 i/s - 3.21x faster rapidjson: 822422.4 i/s - 1.71x faster oj: 554877.7 i/s - 1.15x faster oj strict: 547887.7 i/s - 1.14x faster == Parsing small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 154.479k i/100ms oj 220.283k i/100ms oj strict 249.928k i/100ms Oj::Parser 445.062k i/100ms rapidjson 289.615k i/100ms Calculating ------------------------------------- json 1.581M (± 3.0%) i/s (632.55 ns/i) - 8.033M in 5.086476s oj 2.202M (± 3.5%) i/s (454.08 ns/i) - 11.014M in 5.008146s oj strict 2.498M (± 3.5%) i/s (400.25 ns/i) - 12.496M in 5.008245s Oj::Parser 4.640M (± 0.4%) i/s (215.50 ns/i) - 23.588M in 5.083443s rapidjson 3.111M (± 0.3%) i/s (321.44 ns/i) - 15.639M in 5.027097s Comparison: json: 1580898.5 i/s Oj::Parser: 4640298.1 i/s - 2.94x faster rapidjson: 3111005.2 i/s - 1.97x faster oj strict: 2498421.4 i/s - 1.58x faster oj: 2202276.6 i/s - 1.39x faster == Parsing test from oj (256 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 37.580k i/100ms oj 41.899k i/100ms oj strict 50.731k i/100ms Oj::Parser 74.589k i/100ms rapidjson 50.954k i/100ms Calculating ------------------------------------- json 382.150k (± 1.0%) i/s (2.62 μs/i) - 1.917M in 5.015737s oj 420.282k (± 0.2%) i/s (2.38 μs/i) - 2.137M in 5.084338s oj strict 511.758k (± 0.5%) i/s (1.95 μs/i) - 2.587M in 5.055821s Oj::Parser 759.087k (± 0.3%) i/s (1.32 μs/i) - 3.804M in 5.011388s rapidjson 518.273k (± 1.8%) i/s (1.93 μs/i) - 2.599M in 5.015867s Comparison: json: 382149.6 i/s Oj::Parser: 759087.1 i/s - 1.99x faster rapidjson: 518272.8 i/s - 1.36x faster oj strict: 511758.4 i/s - 1.34x faster oj: 420282.5 i/s - 1.10x faster == Parsing twitter.json (567916 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 52.000 i/100ms oj 63.000 i/100ms oj strict 74.000 i/100ms Oj::Parser 79.000 i/100ms rapidjson 56.000 i/100ms Calculating ------------------------------------- json 522.896 (± 0.4%) i/s (1.91 ms/i) - 2.652k in 5.071809s oj 624.849 (± 0.6%) i/s (1.60 ms/i) - 3.150k in 5.041398s oj strict 737.779 (± 0.4%) i/s (1.36 ms/i) - 3.700k in 5.015117s Oj::Parser 789.254 (± 0.3%) i/s (1.27 ms/i) - 3.950k in 5.004764s rapidjson 565.663 (± 0.4%) i/s (1.77 ms/i) - 2.856k in 5.049015s Comparison: json: 522.9 i/s Oj::Parser: 789.3 i/s - 1.51x faster oj strict: 737.8 i/s - 1.41x faster oj: 624.8 i/s - 1.19x faster rapidjson: 565.7 i/s - 1.08x faster == Parsing citm_catalog.json (1727030 bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 27.000 i/100ms oj 31.000 i/100ms oj strict 36.000 i/100ms Oj::Parser 42.000 i/100ms rapidjson 38.000 i/100ms Calculating ------------------------------------- json 305.248 (± 0.3%) i/s (3.28 ms/i) - 1.539k in 5.041813s oj 320.265 (± 3.4%) i/s (3.12 ms/i) - 1.612k in 5.039715s oj strict 373.701 (± 1.6%) i/s (2.68 ms/i) - 1.872k in 5.010633s Oj::Parser 457.792 (± 0.4%) i/s (2.18 ms/i) - 2.310k in 5.046049s rapidjson 350.933 (± 8.8%) i/s (2.85 ms/i) - 1.748k in 5.052491s Comparison: json: 305.2 i/s Oj::Parser: 457.8 i/s - 1.50x faster oj strict: 373.7 i/s - 1.22x faster rapidjson: 350.9 i/s - 1.15x faster oj: 320.3 i/s - 1.05x faster == Parsing canada.json (`2251051` bytes) ruby 3.3.4 (2024-07-09 revision `be1089c8ec`) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 2.000 i/100ms oj 2.000 i/100ms oj strict 2.000 i/100ms Oj::Parser 2.000 i/100ms rapidjson 28.000 i/100ms Calculating ------------------------------------- json 29.216 (± 6.8%) i/s (34.23 ms/i) - 146.000 in 5.053753s oj 24.899 (± 0.0%) i/s (40.16 ms/i) - 126.000 in 5.061915s oj strict 24.828 (± 4.0%) i/s (40.28 ms/i) - 124.000 in 5.003067s Oj::Parser 30.867 (± 3.2%) i/s (32.40 ms/i) - 156.000 in 5.057104s rapidjson 285.761 (± 1.0%) i/s (3.50 ms/i) - 1.456k in 5.095715s Comparison: json: 29.2 i/s rapidjson: 285.8 i/s - 9.78x faster Oj::Parser: 30.9 i/s - same-ish: difference falls within error oj: 24.9 i/s - 1.17x slower oj strict: 24.8 i/s - 1.18x slower ```	2024-10-08 14:18:37 +00:00
Matt Valentine-House	8e7df4b7c6	Rename size_pool -> heap Now that we've inlined the eden_heap into the size_pool, we should rename the size_pool to heap. So that Ruby contains multiple heaps, with different sized objects. The term heap as a collection of memory pages is more in memory management nomenclature, whereas size_pool was a name chosen out of necessity during the development of the Variable Width Allocation features of Ruby. The concept of size pools was introduced in order to facilitate different sized objects (other than the default 40 bytes). They wrapped the eden heap and the tomb heap, and some related state, and provided a reasonably simple way of duplicating all related concerns, to provide multiple pools that all shared the same structure but held different objects. Since then various changes have happend in Ruby's memory layout: * The concept of tomb heaps has been replaced by a global free pages list, with each page having it's slot size reconfigured at the point when it is resurrected * the eden heap has been inlined into the size pool itself, so that now the size pool directly controls the free_pages list, the sweeping page, the compaction cursor and the other state that was previously being managed by the eden heap. Now that there is no need for a heap wrapper, we should refer to the collection of pages containing Ruby objects as a heap again rather than a size pool	2024-10-03 21:20:09 +01:00
Nobuyoshi Nakada	7be1fafe58	Refactor `Time#xmlschema` And refine uncommon date cases. # Iteration per second (i/s) \| \|compare-ruby\|built-ruby\| \|:---------------------------\|-----------:\|---------:\| \|time.xmlschema \| 5.020M\| 14.192M\| \| \| -\| 2.83x\| \|utc_time.xmlschema \| 6.454M\| 15.331M\| \| \| -\| 2.38x\| \|time.xmlschema(6) \| 4.216M\| 10.043M\| \| \| -\| 2.38x\| \|utc_time.xmlschema(6) \| 5.486M\| 10.592M\| \| \| -\| 1.93x\| \|time.xmlschema(9) \| 4.294M\| 10.340M\| \| \| -\| 2.41x\| \|utc_time.xmlschema(9) \| 4.784M\| 10.909M\| \| \| -\| 2.28x\| \|fraction_sec.xmlschema(10) \| 366.982k\| 3.406M\| \| \| -\| 9.28x\| \|future_time.xmlschema \| 994.595k\| 15.853M\| \| \| -\| 15.94x\|	2024-09-23 14:29:25 +09:00
aoki1980taichi	5894202365	typo otherBasis -> orthoBasis The original function name in ao.c was orthoBasis. I guess the function is generating orthonormal basis (https://en.wikipedia.org/wiki/Orthonormal_basis).	2024-09-13 14:35:25 +09:00
Jean Boussier	57e3fc32ea	Move Time#xmlschema in core and optimize it [Feature #20707] Converting Time into RFC3339 / ISO8601 representation is an significant hotspot for applications that serialize data in JSON, XML or other formats. By moving it into core we can optimize it much further than what `strftime` will allow. ``` compare-ruby: ruby 3.4.0dev (2024-08-29T13:11:40Z master `6b08a50a62`) +YJIT [arm64-darwin23] built-ruby: ruby 3.4.0dev (2024-08-30T13:17:32Z native-xmlschema 34041ff71f) +YJIT [arm64-darwin23] warming up...... \| \|compare-ruby\|built-ruby\| \|:-----------------------\|-----------:\|---------:\| \|time.xmlschema \| 1.087M\| 5.190M\| \| \| -\| 4.78x\| \|utc_time.xmlschema \| 1.464M\| 6.848M\| \| \| -\| 4.68x\| \|time.xmlschema(6) \| 859.960k\| 4.646M\| \| \| -\| 5.40x\| \|utc_time.xmlschema(6) \| 1.080M\| 5.917M\| \| \| -\| 5.48x\| \|time.xmlschema(9) \| 893.909k\| 4.668M\| \| \| -\| 5.22x\| \|utc_time.xmlschema(9) \| 1.056M\| 5.707M\| \| \| -\| 5.40x\| ```	2024-09-05 19:23:12 +02:00
Jean Boussier	a3f589640f	Time#strftime: grow the buffer faster Use a classic doubling of capacity rather than only adding twice as much capacity as is already known to be needed. ``` compare-ruby: ruby 3.4.0dev (2024-09-04T09:21:53Z opt-strftime-2 `ae98d19cf9`) +YJIT [arm64-darwin23] built-ruby: ruby 3.4.0dev (2024-09-04T11:46:02Z opt-strftime-growth 586263d6fb) +YJIT [arm64-darwin23] warming up... \| \|compare-ruby\|built-ruby\| \|:---------------------------\|-----------:\|---------:\| \|time.strftime("%FT%T") \| 1.754M\| 1.889M\| \| \| -\| 1.08x\| \|time.strftime("%FT%T.%3N") \| 1.508M\| 1.749M\| \| \| -\| 1.16x\| \|time.strftime("%FT%T.%6N") \| 1.488M\| 1.756M\| \| \| -\| 1.18x\| compare-ruby: ruby 3.4.0dev (2024-09-04T09:21:53Z opt-strftime-2 `ae98d19cf9`) +YJIT [arm64-darwin23] built-ruby: ruby 3.4.0dev (2024-09-04T09:21:53Z opt-strftime-2 `ae98d19cf9`) +YJIT [arm64-darwin23] warming up... ```	2024-09-04 14:52:55 +02:00
Jean Boussier	9594db0cf2	Implement Hash.new(capacity:) [Feature #19236] When building a large hash, pre-allocating it with enough capacity can save many re-hashes and significantly improve performance. ``` /opt/rubies/3.3.0/bin/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::../miniruby-master -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby --disable-gem" \ --output=markdown --output-compare -v $(find ./benchmark -maxdepth 1 -name 'hash_new' -o -name 'hash_new.yml' -o -name 'hash_new.rb' \| sort) compare-ruby: ruby 3.4.0dev (2024-03-25T11:48:11Z master `f53209f023`) +YJIT dev [arm64-darwin23] last_commit=[ruby/irb] Cache RDoc::RI::Driver.new (https://github.com/ruby/irb/pull/911) built-ruby: ruby 3.4.0dev (2024-03-25T15:29:40Z hash-new-rb 77652b08a2) +YJIT dev [arm64-darwin23] warming up... \| \|compare-ruby\|built-ruby\| \|:-------------------\|-----------:\|---------:\| \|new \| 7.614M\| 5.976M\| \| \| 1.27x\| -\| \|new_with_capa_1k \| 13.931k\| 15.698k\| \| \| -\| 1.13x\| \|new_with_capa_100k \| 124.746\| 148.283\| \| \| -\| 1.19x\| ```	2024-07-08 12:24:33 +02:00
Jean Boussier	9e9f1d9301	Precompute embedded string literals hash code With embedded strings we often have some space left in the slot, which we can use to store the string Hash code. It's probably only worth it for string literals, as they are the ones likely to be used as hash keys. We chose to store the Hash code right after the string terminator as to make it easy/fast to compute, and not require one more union in RString. ``` compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main `f77618c1fa`) [arm64-darwin23] built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23] last_commit=Precompute embedded string literals hash code \| \|compare-ruby\|built-ruby\| \|:-----------\|-----------:\|---------:\| \|symbol \| 39.275M\| 39.753M\| \| \| -\| 1.01x\| \|dyn_symbol \| 37.348M\| 37.704M\| \| \| -\| 1.01x\| \|small_lit \| 29.514M\| 33.948M\| \| \| -\| 1.15x\| \|frozen_lit \| 27.180M\| 33.056M\| \| \| -\| 1.22x\| \|iseq_lit \| 27.391M\| 32.242M\| \| \| -\| 1.18x\| ``` Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>	2024-05-28 07:32:41 +02:00
Aaron Patterson	f86fb1eda2	add allocation benchmark	2024-04-15 11:29:48 -07:00
Takashi Kokubun	70de3b170b	Optimize Hash methods with Kernel#hash (#10160 )	2024-03-01 11:16:31 -08:00
Jeremy Evans	f446d68ba6	Add benchmarks for super and zsuper calls of different types These show gains from the recent optimization commits: ``` arg_splat miniruby: 7346039.9 i/s miniruby-before: 4692240.8 i/s - 1.57x slower arg_splat_block miniruby: 6539749.6 i/s miniruby-before: 4358063.6 i/s - 1.50x slower splat_kw_splat miniruby: 5433641.5 i/s miniruby-before: 3851048.6 i/s - 1.41x slower splat_kw_splat_block miniruby: 4916137.1 i/s miniruby-before: 3477090.1 i/s - 1.41x slower splat_kw_block miniruby: 2912829.5 i/s miniruby-before: 2465611.7 i/s - 1.18x slower arg_splat_post miniruby: 2195208.2 i/s miniruby-before: 1860204.3 i/s - 1.18x slower ``` zsuper only speeds up in the post argument case, because it was already set to use splatarray false in cases where there were no post arguments.	2024-03-01 07:10:25 -08:00
Alan Wu	e4272fd292	Avoid allocation when passing no keywords to anonymous kwrest methods Thanks to the new semantics from [ruby-core:115808], `nil` is now equivalent to `{}`. Since the only thing one could do with anonymous keyword rest parameter is to delegate it with ``, nil is just as good as an empty hash. Using nil avoids allocating an empty hash. This is particularly important for `...` methods since they now use `kwrest` under the hood after `4f77d8d328`. Most calls don't pass keywords. Comparison: fw_no_kw post: 9816800.9 i/s pre: 8570297.0 i/s - 1.15x slower	2024-02-13 11:05:26 -05:00
Jeremy Evans	c20e819e8b	Fix crash when passing large keyword splat to method accepting keywords and keyword splat The following code previously caused a crash: ```ruby h = {} 1000000.times{\|i\| h[i.to_s.to_sym] = i} def f(kw: 1, kws) end f(h) ``` Inside a thread or fiber, the size of the keyword splat could be much smaller and still cause a crash. I found this issue while optimizing method calling by reducing implicit allocations. Given the following code: ```ruby def f(kw: , kws) end kw = {kw: 1} f(kw) ``` The `f(**kw)` call previously allocated two hashes callee side instead of a single hash. This is because `setup_parameters_complex` would extract the keywords from the keyword splat hash to the C stack, to attempt to mirror the case when literal keywords are passed without a keyword splat. Then, `make_rest_kw_hash` would build a new hash based on the extracted keywords that weren't used for literal keywords. Switch the implementation so that if a keyword splat is passed, literal keywords are deleted from the keyword splat hash (or a copy of the hash if the hash is not mutable). In addition to avoiding the crash, this new approach is much more efficient in all cases. With the included benchmark: ``` 1 miniruby: 5247879.9 i/s miniruby-before: 2474050.2 i/s - 2.12x slower 1_mutable miniruby: 1797036.5 i/s miniruby-before: 1239543.3 i/s - 1.45x slower 10 miniruby: 1094750.1 i/s miniruby-before: 365529.6 i/s - 2.99x slower 10_mutable miniruby: 407781.7 i/s miniruby-before: 225364.0 i/s - 1.81x slower 100 miniruby: 100992.3 i/s miniruby-before: 32703.6 i/s - 3.09x slower 100_mutable miniruby: 40092.3 i/s miniruby-before: 21266.9 i/s - 1.89x slower 1000 miniruby: 21694.2 i/s miniruby-before: 4949.8 i/s - 4.38x slower 1000_mutable miniruby: 5819.5 i/s miniruby-before: 2995.0 i/s - 1.94x slower ```	2024-02-11 22:48:38 -08:00
Takashi Kokubun	76f0eec20f	Fix a benchmark to avoid leaving a garbage file	2024-02-08 17:08:23 -08:00
Jeremy Evans	2217e08340	Optimize compilation of large literal arrays To avoid stack overflow, Ruby splits compilation of large arrays into smaller arrays, and concatenates the small arrays together. It previously used newarray/concatarray for this, which is inefficient. This switches the compilation to use pushtoarray, which is much faster. This makes almost all literal arrays only allocate a single array. For cases where there is a large amount of static values in the array, Ruby will statically compile subarrays, and previously added them using concatarray. This switches to concattoarray, avoiding an array allocation for the append. Keyword splats are also supported in arrays, and ignored if the keyword splat is empty. Previously, this used newarraykwsplat and concatarray. This still uses newarraykwsplat, but switches to concattoarray to save an allocation. So large arrays with keyword splats can allocate 2 arrays instead of 1. Previously, for the following array sizes (assuming local variable access for each element), Ruby allocated the following number of arrays: 1000 elements: 7 arrays 10000 elements: 79 arrays 100000 elements: 781 arrays With these changes, only a single array is allocated (or 2 for a large array with a keyword splat. Results using the included benchmark: ``` array_1000 miniruby: 34770.0 i/s ./miniruby-before: 10511.7 i/s - 3.31x slower array_10000 miniruby: 4938.8 i/s ./miniruby-before: 483.8 i/s - 10.21x slower array_100000 miniruby: 727.2 i/s ./miniruby-before: 4.1 i/s - 176.98x slower ``` Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>	2024-01-27 10:16:52 -08:00
Jeremy Evans	42d891be2c	Add benchmark for implicit array/hash allocation reduction changes Benchmark results: ``` named_multi_arg_splat after: 5344097.6 i/s before: 3088134.0 i/s - 1.73x slower named_post_splat after: 5401882.3 i/s before: 2629321.8 i/s - 2.05x slower anon_arg_splat after: 12242780.9 i/s before: 6845413.2 i/s - 1.79x slower anon_arg_kw_splat after: 11277398.7 i/s before: 4329509.4 i/s - 2.60x slower anon_multi_arg_splat after: 5132699.5 i/s before: 3018103.7 i/s - 1.70x slower anon_post_splat after: 5602915.1 i/s before: 2645185.5 i/s - 2.12x slower anon_kw_splat after: 15403727.3 i/s before: 6249504.6 i/s - 2.46x slower anon_fw_to_named_splat after: 2985715.3 i/s before: 2049159.9 i/s - 1.46x slower anon_fw_to_named_no_splat after: 2941030.4 i/s before: 2100380.0 i/s - 1.40x slower fw_to_named_splat after: 2801008.7 i/s before: 2012416.4 i/s - 1.39x slower fw_to_named_no_splat after: 2742670.4 i/s before: 1957707.2 i/s - 1.40x slower fw_to_anon_to_named_splat after: 2309246.6 i/s before: 1375924.6 i/s - 1.68x slower fw_to_anon_to_named_no_splat after: 2193227.6 i/s before: 1351184.1 i/s - 1.62x slower ```	2024-01-24 18:25:55 -08:00
Takashi Kokubun	c84237f953	Rewrite Array#each in Ruby using Primitive (#9533 )	2024-01-23 20:09:57 +00:00
Takashi Kokubun	27c1dd8634	YJIT: Allow inlining ISEQ calls with a block (#9622 ) * YJIT: Allow inlining ISEQ calls with a block * Leave a TODO comment about u16 inline_block	2024-01-23 19:36:23 +00:00
Jeremy Evans	f5a01b0916	Add benchmark for recent optimization to avoid implicit allocations	2023-12-07 11:27:55 -08:00
Jeremy Evans	3081c83169	Support tracing of struct member accessor methods This follows the same approach used for attr_reader/attr_writer in `2d98593bf5`, skipping the checking for tracing after the first call using the call cache, and clearing the call cache when tracing is turned on/off. Fixes [Bug #18886]	2023-12-07 10:29:33 -08:00
Jean Boussier	83c385719d	Specialize String#dup `String#+@` is 2-3 times faster than `String#dup` because it can directly go through `rb_str_dup` instead of using the generic much slower `rb_obj_dup`. This fact led to the existance of the ugly `Performance/UnfreezeString` rubocop performance rule that encourage users to rewrite the much more readable and convenient `"foo".dup` into the ugly `(+"foo")`. Let's make that rubocop rule useless. ``` compare-ruby: ruby 3.3.0dev (2023-11-20T02:02:55Z master `701b0650de`) [arm64-darwin22] last_commit=[ruby/prism] feat: add encoding for IBM865 (https://github.com/ruby/prism/pull/1884) built-ruby: ruby 3.3.0dev (2023-11-20T12:51:45Z faster-str-lit-dup 6b745bbc5d) [arm64-darwin22] warming up.. \| \|compare-ruby\|built-ruby\| \|:------\|-----------:\|---------:\| \|uplus \| 16.312M\| 16.332M\| \| \| -\| 1.00x\| \|dup \| 5.912M\| 16.329M\| \| \| -\| 2.76x\| ```	2023-11-20 14:33:20 +01:00
Jean Boussier	b92b9e1e9e	vm_getivar: assume the cached shape_id like have a common ancestor When an inline cache misses, it is very likely that the stale shape_id and the current instance shape_id have a close common ancestor. For example if the instance variable is sometimes frozen sometimes not, one of the two shape will be the direct parent of the other. Another pattern that commonly cause IC misses is "memoization", in such case the object will have a "base common shape" and then a number of close descendants. In addition, when we find a common ancestor, we store it in the inline cache instead of the current shape. This help prevent the cache from flip-flopping, ensuring the next lookup will be marginally faster and more generally avoid writing in memory too much. However, now that shapes have an ancestors index, we only check for a few ancestors before falling back to use the index. So overall this change speeds up what is assumed to be the more common case, but makes what is assumed to be the less common case a bit slower. ``` compare-ruby: ruby 3.3.0dev (2023-10-26T05:30:17Z master `701ca070b4`) [arm64-darwin22] built-ruby: ruby 3.3.0dev (2023-10-26T09:25:09Z shapes_double_sear.. a723a85235) [arm64-darwin22] warming up...... \| \|compare-ruby\|built-ruby\| \|:------------------------------------\|-----------:\|---------:\| \|vm_ivar_stable_shape \| 11.672M\| 11.679M\| \| \| -\| 1.00x\| \|vm_ivar_memoize_unstable_shape \| 7.551M\| 10.506M\| \| \| -\| 1.39x\| \|vm_ivar_memoize_unstable_shape_miss \| 11.591M\| 11.624M\| \| \| -\| 1.00x\| \|vm_ivar_unstable_undef \| 9.037M\| 7.981M\| \| \| 1.13x\| -\| \|vm_ivar_divergent_shape \| 8.034M\| 6.657M\| \| \| 1.21x\| -\| \|vm_ivar_divergent_shape_imbalanced \| 10.471M\| 9.231M\| \| \| 1.13x\| -\| ``` Co-Authored-By: John Hawthorn <john@hawthorn.email>	2023-11-03 12:47:43 +01:00
Aaron Patterson	884c3195d9	Update benchmark/vm_ivar_ic_miss.yml Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>	2023-10-24 10:52:06 -07:00
Aaron Patterson	84e4453436	Use a functional red-black tree for indexing the shapes This is an experimental commit that uses a functional red-black tree to create an index of the ancestor shapes. It uses an Okasaki style functional red black tree: https://www.cs.tufts.edu/comp/150FP/archive/chris-okasaki/redblack99.pdf This tree is advantageous because: * It offers O(n log n) insertions and O(n log n) lookups. * It shares memory with previous "versions" of the tree When we insert a node in the tree, only the parts of the tree that need to be rebalanced are newly allocated. Parts of the tree that don't need to be rebalanced are not reallocated, so "new trees" are able to share memory with old trees. This is in contrast to a sorted set where we would have to duplicate the set, and also resort the set on each insertion. I've added a new stat to RubyVM.stat so we can understand how the red black tree increases.	2023-10-24 10:52:06 -07:00
Nobuyoshi Nakada	ccd18d0557	Clean up temporary file, wc.input [ci skip]	2023-10-24 12:30:10 +09:00
Kouhei Yanagita	769f53eb7e	Add benchmarks for Range#reverse_each	2023-10-12 17:34:49 +09:00
Kouhei Yanagita	6ae2996e29	Optimize `Range#count` by using `range_size` if possible	2023-10-05 00:19:55 +09:00
Kouhei Yanagita	91042ec0ae	Add benchmarks for Range#bsearch	2023-09-26 17:31:10 +09:00
Kouhei Yanagita	7e350f5310	Optimize Range#bsearch for beginless/endless ranges within Fixnum	2023-09-21 10:30:58 +09:00
Nobuyoshi Nakada	b4213a73b8	[Feature #19839 ] Fix `Range#overlap?` for empty ranges Empty ranges do not overlap with any range. Regarding benchmarks, PR#8242 is significantly faster in some cases, but one of these two cases is a wrong result. \| \|ActiveSupport\| PR#8242\|built-ruby\| \|:--------------------------\|------------:\|-------:\|---------:\| \|(2..3).overlap?(1..1) \| 7.761M\| 15.053M\| 32.368M\| \| \| -\| 1.94x\| 4.17x\| \|(2..3).overlap?(2..4) \| 25.720M\| 55.070M\| 21.981M\| \| \| 1.17x\| 2.51x\| -\| \|(2..3).overlap?(4..5) \| 7.616M\| 15.048M\| 21.730M\| \| \| -\| 1.98x\| 2.85x\| \|(2..3).overlap?(2..1) \| 25.585M\| 56.545M\| 32.786M\| \| \| -\| 2.21x\| 1.28x\| \|(2..3).overlap?(0..1) \| 7.554M\| 14.755M\| 32.545M\| \| \| -\| 1.95x\| 4.31x\| \|(2..3).overlap?(...1) \| 6.681M\| 5.843M\| 32.255M\| \| \| 1.14x\| -\| 5.52x\| \|(2...3).overlap?(..2) \| 6.676M\| 5.817M\| 21.572M\| \| \| 1.15x\| -\| 3.71x\| \|(2...3).overlap?(3...) \| 7.392M\| 14.755M\| 31.805M\| \| \| -\| 2.00x\| 4.30x\| \|(2..3).overlap?('a'..'d') \| 3.675M\| 3.482M\| 17.009M\| \| \| 1.06x\| -\| 4.89x\|	2023-09-16 17:24:21 +09:00
Kouhei Yanagita	7d08dbd015	Optimize Range#bsearch for beginless/endless ranges On Range#bsearch for endless ranges, we try positions at `begin + 2i` (i = 0, 1, 2, ...) to find a point that satisfies a given condition. Subsequently, we perform binary searching with the interval `[begin, begin + 2n]`. However, the interval `[begin + 2(n-1), begin + 2n]` is sufficient for binary search because `begin + 2**(n-1)` does not satisfy the condition. The same applies to beginless ranges.	2023-09-16 12:10:09 +09:00
Nobuyoshi Nakada	5e79d5a560	Make `rb_str_rindex` return byte index Leave callers to convert byte index to char index, as well as `rb_str_index`, so that `rb_str_rpartition` does not need to re-convert char index to byte index.	2023-07-09 16:39:28 +09:00
Nobuyoshi Nakada	ab6eb3786c	Optimize `Regexp#dup` and `Regexp.new(/RE/)` When copying from another regexp, copy already built `regex_t` instead of re-compiling its source.	2023-06-09 20:22:30 +09:00
nekoyama32767	87217f26f1	[Feature #19643 ] Direct primitive compare sort for `Array#sort_by` In most of case `sort_by` works on primitive type. Using `qsort_r` with function pointer is much slower than compare data directly. I implement an intro sort which compare primitive data directly for `sort_by`. We can even afford an O(n) type check before primitive data sort. It still go faster.	2023-05-20 19:40:27 +09:00
Jeremy Evans	a82a24ed57	Optimize method_missing calls CALLER_ARG_SPLAT is not necessary for method_missing. We just need to unshift the method name into the arguments. This optimizes all method_missing calls: * mm(recv) ~9% * mm(recv, args) ~215% for args.length == 200 mm(recv, args, kw) ~55% for args.length == 200 mm(recv, *kw) ~22% mm(recv, kw: 1) ~100% Note that empty argument splats do get slower with this approach, by about 30-40%. Other than non-empty argument splats, other argument splats are faster, with the speedup depending on the number of arguments.	2023-04-25 08:06:16 -07:00
Jeremy Evans	583e9d24d4	Optimize symproc calls Similar to the bmethod/send optimization, this avoids using CALLER_ARG_SPLAT if not necessary. As long as the receiver argument can be shifted off, other arguments are passed through as-is. This optimizes the following types of calls: * symproc.(recv) ~5% * symproc.(recv, args) ~65% for args.length == 200 symproc.(recv, args, kw) ~45% for args.length == 200 symproc.(recv, *kw) ~30% symproc.(recv, kw: 1) ~100% Note that empty argument splats do get slower with this approach, by about 2-3%. This is probably because iseq argument setup is slower for empty argument splats than CALLER_SETUP_ARG is. Other than non-empty argument splats, other argument splats are faster, with the speedup depending on the number of arguments. The following types of calls are not optimized: * symproc.(args) symproc.(args, *kw) This is because the you cannot shift the receiver argument off without first splatting the arg.	2023-04-25 08:06:16 -07:00
Jeremy Evans	9b4bf02aa8	Optimize send calls Similar to the bmethod optimization, this avoids using CALLER_ARG_SPLAT if not necessary. As long as the method argument can be shifted off, other arguments are passed through as-is. This optimizes the following types of calls: * send(meth, arg) ~5% * send(meth, args) ~75% for args.length == 200 send(meth, args, kw) ~50% for args.length == 200 send(meth, *kw) ~25% send(meth, kw: 1) ~115% Note that empty argument splats do get slower with this approach, by about 20%. This is probably because iseq argument setup is slower for empty argument splats than CALLER_SETUP_ARG is. Other than non-empty argument splats, other argument splats are faster, with the speedup depending on the number of arguments. The following types of calls are not optimized: * send(args) send(args, *kw) This is because the you cannot shift the method argument off without first splatting the arg.	2023-04-25 08:06:16 -07:00
Jeremy Evans	af2da6419a	Optimize cfunc calls for f(a) and f(a, *kw) if kw is empty This optimizes the following calls: ~10-15% for f(a) when a does not end with a flagged keywords hash ~10-15% for f(a) when a ends with an empty flagged keywords hash ~35-40% for f(a, *kw) if kw is empty This still copies the array contents to the VM stack, but avoids some overhead. It would be faster to use the array pointer directly, but that could cause problems if the array was modified during the call to the function. You could do that optimization for frozen arrays, but as splatting frozen arrays is uncommon, and the speedup is minimal (<5%), it doesn't seem worth it. The vm_send_cfunc benchmark has been updated to test additional cfunc call types, and the numbers above were taken from the benchmark results.	2023-04-25 08:06:16 -07:00

1 2 3 4 5 ...

456 Коммитов