github/ruby - ruby

Граф коммитов

Автор	SHA1	Сообщение	Дата
Jean Boussier	036ca726bb	Fix documentation for String#index and String#byterindex	2024-09-04 11:26:17 +02:00
Nobuyoshi Nakada	ade240e578	Adjust indents [ci skip]	2024-09-04 10:28:52 +09:00
Jean Boussier	b7fa2dd0d0	rb_enc_str_asciionly_p: avoid always fetching the encoding Profiling of `JSON.dump` shows a significant amount of time spent in `rb_enc_str_asciionly_p`, in large part because it fetches the encoding. It can be made twice as fast in this scenario by first checking the coderange and only falling back to fetching the encoding if the coderange is unknown. Additionally we can skip fetching the encoding for the common popular encodings.	2024-09-03 12:21:36 +02:00
Zack Deveau	e7cb70be4e	Improve String#rindex performance on OSX On OSX, String#rindex is slow due to the lack of `memrchr`. The fallback implementation finds a match by instead doing a `memcmp` on every single character in the search string looking for a substring match. For OSX hosts, this changeset introduces a simple `memrchr` implementation, `rb_memrchr`, that can be used instead. An example benchmark below demonstrates an 8000 char long search string with a 10 char substring near the end. ``` ruby-master \| substring near the end \| osx UTF-8 user system total real index 0.000111 0.000000 0.000111 ( 0.000110) rindex 0.000446 0.000005 0.000451 ( 0.000454) ``` ``` ruby-patched \| substring near the end \| osx UTF-8 user system total real index 0.000112 0.000000 0.000112 ( 0.000111) rindex 0.000057 0.000001 0.000058 ( 0.000057) ```	2024-09-03 14:25:25 +09:00
Jean Boussier	4e85b6b4c4	rb_str_bytesplice: skip encoding check if encodings are the same If both strings have the same encoding, all this work is useless.	2024-08-09 22:06:44 +02:00
Jean Boussier	3bac5f6af5	string.c: add fastpath in str_ensure_byte_pos If the string only contain single byte characters we can skips all the costly checks.	2024-08-09 22:06:44 +02:00
Jean Boussier	a332367dad	string.c: Add fastpath to single_byte_optimizable `rb_enc_from_index` is a costly operation so it is worth avoiding to call it for the common encodings. Also in the case of UTF-8, it's more efficient to scan the coderange if it is unknown that to fallback to the slower algorithms.	2024-08-09 22:06:44 +02:00
Jean Boussier	2bd5dc47ac	string.c: str_capacity don't check for immediates `STR_EMBED_P` uses `FL_TEST_RAW` meaning we already assume `str` isn't an immediate, so we can use `FL_TEST_RAW` here too.	2024-08-09 15:20:58 +02:00
Jean Boussier	af44af238b	str_independent: add a fastpath with a single flag check If we assume that most strings we modify are not frozen and are independent, then we can optimize this case by replacing multiple flag checks by a single mask check.	2024-08-09 15:20:58 +02:00
Kevin Menard	04a6165ac0	YJIT: Enhance the `String#<<` method substitution to handle integer codepoint values. (#11032 ) * Document why we need to explicitly spill registers. * Simplify passing a byte value to `str_buf_cat`. * YJIT: Enhance the `String#<<` method substitution to handle integer codepoint values. * YJIT: Move runtime type check into YJIT. Performing the check in YJIT means we can make assumptions about the type. It also improves correctness of stack traces in cases where the codepoint argument is not a String or a Fixnum.	2024-08-02 15:45:22 -04:00
Jean Boussier	83f57ca3d2	String.new(capacity:) don't substract termlen [Bug #20585] This was changed in 36a06efdd9f0604093dccbaf96d4e2cb17874dc8 because `String.new(1024)` would end up allocating `1025` bytes, but the problem with this change is that the caller may be trying to right size a String. So instead, we should just better document the behavior of `capacity:`.	2024-06-19 15:11:07 +02:00
Kevin Menard	a119b5f879	Add a fast path implementation for appending single byte values to US-ASCII strings.	2024-06-17 09:44:48 -07:00
Kevin Menard	27e13fbc58	Add a fast path implementation for appending single byte values to binary strings. Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>	2024-06-17 09:44:48 -07:00
Alan Wu	6416ee33eb	Simplify unaligned write for pre-computed string hash	2024-06-13 18:52:09 -04:00
Alan Wu	a8730adb60	rb_str_hash(): Avoid UB with making misaligned pointer Previously, on common platforms, this code made a pointer to a union of 8 byte alignment out of a char pointer that is not guaranteed to satisfy the alignment requirement. That is undefined behavior according to [C99 6.3.2.3p7](https://port70.net/~nsz/c/c99/n1256.html#6.3.2.3p7). Use memcpy() to do the unaligned read instead.	2024-06-13 18:52:09 -04:00
tompng	a9b8981aac	Simplify rb_str_resize clear range condition	2024-06-13 18:27:02 +02:00
tompng	9c7374b0e6	Clear coderange when rb_str_resize change size In some encoding like utf-16 utf-32, expanding the string with null bytes can change coderange to either broken or valid.	2024-06-13 18:27:02 +02:00
Nobuyoshi Nakada	dd8903fed7	[Bug #20566 ] Mention out-of-range argument cases in `String#<<` Also [Bug #18973].	2024-06-09 10:11:06 +09:00
Jean Boussier	730e3b2ce0	Stop exposing `rb_str_chilled_p` [Feature #20205] Now that chilled strings no longer appear as frozen, there is no need to offer an API to check for chilled strings. We however need to change `rb_check_frozen_internal` to no longer be a macro, as it needs to check for chilled strings.	2024-06-02 13:53:35 +02:00
Nobuyoshi Nakada	7d144781a9	[Bug #20512 ] Set coderange in `Range#each` of strings	2024-05-28 16:59:51 +09:00
Nobuyoshi Nakada	0a92c9f2b0	Set empty strings to ASCII-only	2024-05-28 16:24:21 +09:00
Jean Boussier	9e9f1d9301	Precompute embedded string literals hash code With embedded strings we often have some space left in the slot, which we can use to store the string Hash code. It's probably only worth it for string literals, as they are the ones likely to be used as hash keys. We chose to store the Hash code right after the string terminator as to make it easy/fast to compute, and not require one more union in RString. ``` compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main `f77618c1fa`) [arm64-darwin23] built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23] last_commit=Precompute embedded string literals hash code \| \|compare-ruby\|built-ruby\| \|:-----------\|-----------:\|---------:\| \|symbol \| 39.275M\| 39.753M\| \| \| -\| 1.01x\| \|dyn_symbol \| 37.348M\| 37.704M\| \| \| -\| 1.01x\| \|small_lit \| 29.514M\| 33.948M\| \| \| -\| 1.15x\| \|frozen_lit \| 27.180M\| 33.056M\| \| \| -\| 1.22x\| \|iseq_lit \| 27.391M\| 32.242M\| \| \| -\| 1.18x\| ``` Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>	2024-05-28 07:32:41 +02:00
Étienne Barrié	1376881e9a	Stop marking chilled strings as frozen They were initially made frozen to avoid false positives for cases such as: str = str.dup if str.frozen? But this may cause bugs and is generally confusing for users. [Feature #20205] Co-authored-by: Jean Boussier <byroot@ruby-lang.org>	2024-05-28 07:32:33 +02:00
Jean Boussier	3a7846b1aa	Add a hint of `ASCII-8BIT` being `BINARY` [Feature #18576] Since outright renaming `ASCII-8BIT` is deemed to backward incompatible, the next best thing would be to only change its `#inspect`, particularly in exception messages.	2024-04-18 10:17:26 +02:00
Jean Boussier	f06670c5a2	Eliminate usage of OBJ_FREEZE_RAW Previously it would bypass the `FL_ABLE` check, but since shapes introduction, it started having a different behavior than `OBJ_FREEZE`, as it would onyl set the `FL_FREEZE` flag, but not update the shape. I have no indication of this causing a bug yet, but it seems like a trap waiting to happen.	2024-04-16 17:20:35 +02:00
Étienne Barrié	49b31c7680	Document STR_CHILLED flag on RString [Feature #20205]	2024-04-08 13:25:09 +02:00
Nobuyoshi Nakada	4dd9e5cf74	Add builtin type assertion	2024-04-08 11:13:29 +09:00
Peter Zhu	e50590a541	Assert that Symbol#inspect returns a T_STRING	2024-04-05 16:15:28 -04:00
KJ Tsanaktsidis	9d0a5148ae	Add missing RB_GC_GUARDs related to DATA_PTR I discovered the problem in `compile.c` from a failing TestIseqLoad#test_stressful_roundtrip test with ASAN enabled. The other two changes in array.c and string.c I found by auditing similar usages of DATA_PTR in the codebase. [Bug #20402]	2024-03-31 20:33:38 +11:00
Étienne Barrié	2b08406cd0	Expose rb_str_chilled_p Some extensions (like stringio) may need to differentiate between chilled strings and frozen strings. They can now use rb_str_chilled_p but must check for its presence since the function will be removed when chilled strings are removed. [Bug #20389] [Feature #20205] Co-authored-by: Jean Boussier <byroot@ruby-lang.org>	2024-03-26 12:54:54 +01:00
Nobuyoshi Nakada	fdd7ffb70c	[Bug #20389 ] Chilled string cannot be a shared root	2024-03-25 10:26:56 +09:00
Étienne Barrié	12be40ae6b	Implement chilled strings [Feature #20205] As a path toward enabling frozen string literals by default in the future, this commit introduce "chilled strings". From a user perspective chilled strings pretend to be frozen, but on the first attempt to mutate them, they lose their frozen status and emit a warning rather than to raise a `FrozenError`. Implementation wise, `rb_compile_option_struct.frozen_string_literal` is no longer a boolean but a tri-state of `enabled/disabled/unset`. When code is compiled with frozen string literals neither explictly enabled or disabled, string literals are compiled with a new `putchilledstring` instruction. This instruction is identical to `putstring` except it marks the String with the `STR_CHILLED (FL_USER3)` and `FL_FREEZE` flags. Chilled strings have the `FL_FREEZE` flag as to minimize the need to check for chilled strings across the codebase, and to improve compatibility with C extensions. Notes: - `String#freeze`: clears the chilled flag. - `String#-@`: acts as if the string was mutable. - `String#+@`: acts as if the string was mutable. - `String#clone`: copies the chilled flag. Co-authored-by: Jean Boussier <byroot@ruby-lang.org>	2024-03-19 09:26:49 +01:00
Thomas Marshall	7e4b1f8e19	[Bug #20322 ] Fix rb_enc_interned_str_cstr null encoding The documentation for `rb_enc_interned_str_cstr` notes that `enc` can be a null pointer, but this currently causes a segmentation fault when trying to autoload the encoding. This commit fixes the issue by checking for NULL before calling `rb_enc_autoload`.	2024-03-03 10:43:35 +00:00
Peter Zhu	ce8531fed4	Stop using rb_str_locktmp_ensure publicly rb_str_locktmp_ensure is a private API.	2024-02-23 14:08:29 -05:00
Takashi Kokubun	8a6740c70e	YJIT: Lazily push a frame for specialized C funcs (#10080 ) * YJIT: Lazily push a frame for specialized C funcs Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> * Fix a comment on pc_to_cfunc * Rename rb_yjit_check_pc to rb_yjit_lazy_push_frame * Rename it to jit_prepare_lazy_frame_call * Fix a typo * Optimize String#getbyte as well * Optimize String#byteslice as well --------- Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>	2024-02-23 19:08:09 +00:00
Peter Zhu	510404f2de	Stop using rb_fstring publicly rb_fstring is a private API, so we should use rb_str_to_interned_str instead, which is a public API.	2024-02-23 13:33:46 -05:00
Peter Zhu	df5b8ea4db	Remove unneeded RUBY_FUNC_EXPORTED	2024-02-23 10:24:21 -05:00
Takashi Kokubun	d5080f6e8b	Fix -Wsign-compare on String#initialize ../string.c:1886:57: warning: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘long int’ [-Wsign-compare] 1886 \| if (STR_EMBED_P(str)) RUBY_ASSERT(osize <= str_embed_capa(str)); \| ^~	2024-02-22 16:11:30 -08:00
Nobuyoshi Nakada	e04146129e	[Bug #20292 ] Truncate embedded string to new capacity	2024-02-22 22:46:18 +09:00
Nobuyoshi Nakada	b1d70e4264	[Bug #20280 ] Check by `rb_parser_enc_str_coderange` Co-authored-by: Yuichiro Kaneko <spiketeika@gmail.com>	2024-02-19 16:33:26 +09:00
Nobuyoshi Nakada	fcc55dc226	[Bug #20280 ] Raise SyntaxError on invalid encoding symbol	2024-02-19 16:33:26 +09:00
Peter Zhu	4d1b3a2bf3	Unset STR_SHARED when setting string to embed	2024-02-15 12:19:45 -05:00
Yusuke Endoh	25d74b9527	Do not include a backtick in error messages and backtraces [Feature #16495]	2024-02-15 18:42:31 +09:00
Burdette Lamar	65f5435540	[DOC] Doc compliance (#9955 )	2024-02-14 10:47:42 -05:00
Alan Wu	6261d4b4d8	Fix use-after-move in Symbol#inspect The allocation could re-embed `orig_str` and invalidate the data pointer from RSTRING_GETMEM() if the string is embedded. Found on CI, where the test introduced in `7002e77694` ("Fix Symbol#inspect for GC compaction") recently failed. See: <https://github.com/ruby/ruby/actions/runs/7880657560/job/21503019659>	2024-02-13 14:49:54 -05:00
Aaron Patterson	c35fea8509	Specialize String#byteslice(a, b) (#9939 ) * Specialize String#byteslice(a, b) This adds a specialization for String#byteslice when there are two parameters. This makes our protobuf parser go from 5.84x slower to 5.33x slower ``` Comparison: decode upstream (53738 bytes): 7228.5 i/s decode protobuff (53738 bytes): 1236.8 i/s - 5.84x slower Comparison: decode upstream (53738 bytes): 7024.8 i/s decode protobuff (53738 bytes): 1318.5 i/s - 5.33x slower ``` * Update yjit/src/codegen.rs --------- Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>	2024-02-13 16:20:27 +00:00
Peter Zhu	ac38f259aa	Replace assert with RUBY_ASSERT in string.c assert does not print the bug report, only the file and line number of the assertion that failed. RUBY_ASSERT prints the full bug report, which makes it much easier to debug.	2024-02-12 15:07:47 -05:00
Peter Zhu	c6b391214c	[DOC] Improve flags of string	2024-02-08 10:49:38 -05:00
Peter Zhu	5e0c171451	Make io_fwrite safe for compaction [Bug #20169] Embedded strings are not safe for system calls without the GVL because compaction can cause pages to be locked causing the operation to fail with EFAULT. This commit changes io_fwrite to use rb_str_tmp_frozen_no_embed_acquire, which guarantees that the return string is not embedded.	2024-02-05 11:11:07 -05:00
Takashi Kokubun	51753ec7fa	Annotate Symbol#to_s as leaf (#9769 )	2024-01-31 10:47:35 -05:00

1 2 3 4 5 ...

1893 Коммитов