Граф коммитов

401 Коммитов

Автор SHA1 Сообщение Дата
Nobuyoshi Nakada 4b065bbe2b
Move common code to `enc_compatible_latter` 2024-10-05 16:06:54 +09:00
Peter Zhu 176c4bb3c7 Fix corruption of internal encoding string
[Bug #20598]

Just like [Bug #20595], Encoding#name_list and Encoding#aliases can have
their strings corrupted when Encoding.default_internal is set to nil.

Co-authored-by: Matthew Valentine-House <matt@eightbitraptor.com>
2024-06-27 14:06:40 -04:00
Peter Zhu c6a0d03649 Fix corruption of encoding name string
[Bug #20595]

enc_set_default_encoding will free the C string if the encoding is nil,
but the C string can be used by the encoding name string. This will cause
the encoding name string to be corrupted.

Consider the following code:

    Encoding.default_internal = Encoding::ASCII_8BIT
    names = Encoding.default_internal.names
    p names
    Encoding.default_internal = nil
    p names

It outputs:

    ["ASCII-8BIT", "BINARY", "internal"]
    ["ASCII-8BIT", "BINARY", "\x00\x00\x00\x00\x00\x00\x00\x00"]

Co-authored-by: Matthew Valentine-House <matt@eightbitraptor.com>
2024-06-27 09:47:22 -04:00
Jean Boussier 3a7846b1aa Add a hint of `ASCII-8BIT` being `BINARY`
[Feature #18576]

Since outright renaming `ASCII-8BIT` is deemed to backward incompatible,
the next best thing would be to only change its `#inspect`, particularly
in exception messages.
2024-04-18 10:17:26 +02:00
Jean Boussier d4f3dcf4df Refactor VM root modules
This `st_table` is used to both mark and pin classes
defined from the C API. But `vm->mark_object_ary` already
does both much more efficiently.

Currently a Ruby process starts with 252 rooted classes,
which uses `7224B` in an `st_table` or `2016B` in an `RArray`.

So a baseline of 5kB saved, but since `mark_object_ary` is
preallocated with `1024` slots but only use `405` of them,
it's a net `7kB` save.

`vm->mark_object_ary` is also being refactored.

Prior to this changes, `mark_object_ary` was a regular `RArray`, but
since this allows for references to be moved, it was marked a second
time from `rb_vm_mark()` to pin these objects.

This has the detrimental effect of marking these references on every
minors even though it's a mostly append only list.

But using a custom TypedData we can save from having to mark
all the references on minor GC runs.

Addtionally, immediate values are now ignored and not appended
to `vm->mark_object_ary` as it's just wasted space.
2024-03-06 15:33:43 -05:00
Peter Zhu c7ce2f537f Fix memory leak in setting encodings
There is a memory leak in Encoding.default_external= and
Encoding.default_internal= because the duplicated name is not freed
when overwriting.

    10.times do
      1_000_000.times do
        Encoding.default_internal = nil
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

     25664
     41504
     57360
     73232
     89168
    105056
    120944
    136816
    152720
    168576

After:

    9648
    9648
    9648
    9680
    9680
    9680
    9680
    9680
    9680
    9680
2024-01-03 13:31:43 -05:00
Adam Hess 6816e8efcf Free everything at shutdown
when the RUBY_FREE_ON_SHUTDOWN environment variable is set, manually free memory at shutdown.

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
2023-12-07 15:52:35 -05:00
Jean Boussier 60c924770d Mark Encoding as Write Barrier protected
It doesn't even have a mark function.
It's only about a hundred objects, but not reason
to scan them every time.
2023-02-07 11:48:57 +01:00
Benoit Daloze 6abe20e87b Remove Encoding#replicate 2023-01-11 13:41:41 +01:00
Koichi Sasada dbf77d420d surpress warning
now `enc_table->list` is not a pointer.
2022-12-16 11:12:37 +09:00
Koichi Sasada ae19ac5b5b fixed encoding table
This reduces global lock acquiring for reading.
https://bugs.ruby-lang.org/issues/18949
2022-12-16 10:04:37 +09:00
Benoit Daloze 6525b6f760 Remove get_actual_encoding() and the dynamic endian detection for dummy UTF-16/UTF-32
* And simplify callers of get_actual_encoding().
* See [Feature #18949].
* See https://github.com/ruby/ruby/pull/6322#issuecomment-1242758474
2022-09-12 14:02:34 +02:00
Benoit Daloze 14bcf69c9c Deprecate Encoding#replicate
* See [Feature #18949].
2022-09-10 19:02:15 +02:00
Takashi Kokubun 5b21e94beb Expand tabs [ci skip]
[Misc #18891]
2022-07-21 09:42:04 -07:00
Jean Boussier d084585f01 Rename ENCINDEX_ASCII to ENCINDEX_ASCII_8BIT
Otherwise it's way too easy to confuse it with US_ASCII.
2022-07-19 08:48:56 +02:00
Burdette Lamar 81741690a0
[DOC] Main doc for encodings moved from encoding.c to doc/encodings.rdoc (#5748)
Main doc for encodings moved from encoding.c to doc/encodings.rdoc
2022-04-01 20:41:04 -05:00
Nobuyoshi Nakada 7459a32af3 suppress warnings for probable NULL dererefences 2021-10-24 19:24:50 +09:00
卜部昌平 5112a54846 include/ruby/encoding.h: convert macros into inline functions
Less macros == huge win.
2021-10-05 14:18:23 +09:00
Jeremy Evans 3f7da458a7 Make encoding loading not issue warning
Instead of relying on setting an unsetting ruby_verbose, which is
not thread-safe, restructure require_internal and load_lock to
accept a warn argument for whether to warn, and add
rb_require_internal_silent to require without warnings.  Use
rb_require_internal_silent when loading encoding.

Note this does not modify ruby_debug and errinfo handling, those
remain thread-unsafe.

Also silent requires when loading transcoders.
2021-10-02 05:51:29 -09:00
S-H-GAMELINKS 18031f4102 Add rb_encoding_check function 2021-08-22 10:39:14 +09:00
S.H 378e8cdad6
Using RBOOL macro 2021-08-02 12:06:44 +09:00
Jean Boussier 7e8a9af9db rb_enc_interned_str: handle autoloaded encodings
If called with an autoloaded encoding that was not yet
initialized, `rb_enc_interned_str` would crash with
a NULL pointer exception.

See: https://github.com/ruby/ruby/pull/4119#issuecomment-800189841
2021-03-22 21:37:48 +09:00
Koichi Sasada 2a3324fcd2 No sync on ASCII/US_ASCCII/UTF-8
rb_enc_from_index(index) doesn't need locking if index specify
ASCII/US_ASCCII/UTF-8.
rb_enc_from_index() is called frequently so it has impact.

                               user     system      total        real
r_parallel/miniruby       174  0.000209   0.000000   5.559872 (  1.811501)
r_parallel/master_mini    175  0.000238   0.000000  12.664707 (  3.523641)

(repeat x1000 `s.split(/,/)` where s = '0,,' * 1000)
2020-12-17 03:44:23 +09:00
Lars Kanis 94b6933d1c
Set default for Encoding.default_external to UTF-8 on Windows (#2877)
* Use UTF-8 as default for Encoding.default_external on Windows
* Document UTF-8 change on Windows to Encoding.default_external

fix https://bugs.ruby-lang.org/issues/16604
2020-12-08 01:48:37 +09:00
Koichi Sasada 5e3259ea74 fix public interface
To make some kind of Ractor related extensions, some functions
should be exposed.

* include/ruby/thread_native.h
  * rb_native_mutex_*
  * rb_native_cond_*
* include/ruby/ractor.h
  * RB_OBJ_SHAREABLE_P(obj)
  * rb_ractor_shareable_p(obj)
  * rb_ractor_std*()
  * rb_cRactor

and rm ractor_pub.h
and rename srcdir/ractor.h to srcdir/ractor_core.h
    (to avoid conflict with include/ruby/ractor.h)
2020-11-18 03:52:41 +09:00
Stefan Stüben 8c2e5bbf58 Don't redefine #rb_intern over and over again 2020-10-21 12:45:18 +09:00
Koichi Sasada a76a30724d Revert "reduce lock for encoding"
This reverts commit de17e2dea1.

This patch can introduce race condition because of conflicting
read/write access for enc_table::default_list. Maybe we need to
freeze default_list at the end of Init_encdb() in enc/encdb.c.
2020-10-20 01:34:17 +09:00
Koichi Sasada de17e2dea1 reduce lock for encoding
To reduce the number of locking for encoding manipulation,
enc_table::list is splited to ::default_list and ::additional_list.
::default_list is pre-allocated and no need locking to access to
the ::default_list. If additional encoding space is needed, use
::additional_list and this list need to use locking.
However, most of case, ::default_list is enough.
2020-10-19 14:06:40 +09:00
Nobuyoshi Nakada 7ffd14a18c
Check encoding name to replicate
https://hackerone.com/reports/954433
2020-10-15 16:48:25 +09:00
Koichi Sasada 102c2ba65f freeze Encoding objects
Encoding objects can be accessed in multi-ractors and there is no
state to mutate. So we can mark it as frozen and shareable.
[Bug #17188]
2020-10-14 14:02:06 +09:00
Koichi Sasada 11c2f0f36c sync enc_table and rb_encoding_list
enc_table which manages Encoding information. rb_encoding_list
also manages Encoding objects. Both are accessed/modified by ractors
simultaneously so that they should be synchronized.

For enc_table, this patch introduced GLOBAL_ENC_TABLE_ENTER/LEAVE/EVAL
to access this table with VM lock. To make shortcut, three new global
variables global_enc_ascii, global_enc_utf_8, global_enc_us_ascii are
also introduced.

For rb_encoding_list, we split it to rb_default_encoding_list (256 entries)
and rb_additional_encoding_list. rb_default_encoding_list is fixed sized Array
so we don't need to synchronized (and most of apps only needs it). To manage
257 or more encoding objects, they are stored into rb_additional_encoding_list.
To access rb_additional_encoding_list., VM lock is needed.
2020-10-14 14:02:06 +09:00
Nobuyoshi Nakada 9e67a38fde
Fallback to built-in UTF-8 for miniruby
Source code encoding is defaulted to UTF-8 now too.
2020-05-16 17:36:30 +09:00
卜部昌平 9e41a75255 sed -i 's|ruby/impl|ruby/internal|'
To fix build failures.
2020-05-11 09:24:08 +09:00
卜部昌平 d7f4d732c1 sed -i s|ruby/3|ruby/impl|g
This shall fix compile errors.
2020-05-11 09:24:08 +09:00
Nobuyoshi Nakada 5d430c1b34
Added more NORETURN declarations 2020-05-11 00:40:14 +09:00
卜部昌平 9e6e39c351
Merge pull request #2991 from shyouhei/ruby.h
Split ruby.h
2020-04-08 13:28:13 +09:00
Nobuyoshi Nakada fce667ed08
Get rid of warnings/exceptions at cleanup
After the encoding index instance variable is removed when all
instance variables are removed in `obj_free`, then `rb_str_free`
causes uninitialized instance variable warning and nil-to-integer
conversion exception.  Both cases result in object allocation
during GC, and crashes.
2020-02-13 12:46:48 +09:00
卜部昌平 f83781c8c1 rb_enc_str_asciionly_p expects T_STRING
This `str2` variable can be non-string (regexp etc.) but the previous
code passed it directly to rb_enc_str_asciionly_p(), which expects its
argument be a string.  Let's enforce that constraint.
2020-02-10 12:19:30 +09:00
卜部昌平 115fec062c more on NULL versus functions.
Function pointers are not void*.  See also
ce4ea956d2
8427fca49b
2020-02-07 14:24:19 +09:00
Lars Kanis a4fca28b80 Fix description of Encoding.default_(in|ex)ternal
Data written to files is not transcoded per default, but only
when default_internal is set.

The default for default_internal is nil and doesn't depend on the
source file encoding.
2020-02-03 08:42:01 -08:00
卜部昌平 5e22f873ed decouple internal.h headers
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead.  This would significantly
speed up incremental builds.

We take the following inclusion order in this changeset:

1.  "ruby/config.h", where _GNU_SOURCE is defined (must be the very
    first thing among everything).
2.  RUBY_EXTCONF_H if any.
3.  Standard C headers, sorted alphabetically.
4.  Other system headers, maybe guarded by #ifdef
5.  Everything else, sorted alphabetically.

Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
2019-12-26 20:45:12 +09:00
QuestionDriven 54be15f325 [Doc] Fix sample in Encoding#names 2019-12-22 23:01:45 +09:00
QuestionDriven 9654241d5d [Doc] Fix wrong example in Encoding.aliases 2019-12-22 23:01:45 +09:00
Jeremy Evans ffd0820ab3 Deprecate taint/trust and related methods, and make the methods no-ops
This removes the related tests, and puts the related specs behind
version guards.  This affects all code in lib, including some
libraries that may want to support older versions of Ruby.
2019-11-18 01:00:25 +02:00
Jeremy Evans c5c05460ac Warn on access/modify of $SAFE, and remove effects of modifying $SAFE
This removes the security features added by $SAFE = 1, and warns for access
or modification of $SAFE from Ruby-level, as well as warning when calling
all public C functions related to $SAFE.

This modifies some internal functions that took a safe level argument
to no longer take the argument.

rb_require_safe now warns, rb_require_string has been added as a
version that takes a VALUE and does not warn.

One public C function that still takes a safe level argument and that
this doesn't warn for is rb_eval_cmd.  We may want to consider
adding an alternative method that does not take a safe level argument,
and warn for rb_eval_cmd.
2019-11-18 01:00:25 +02:00
Nobuyoshi Nakada 8869384367
Moved Init_encoding from wrong place [Bug #16292] 2019-11-05 10:28:01 +09:00
卜部昌平 7e0ae1698d avoid overflow in integer multiplication
This changeset basically replaces `ruby_xmalloc(x * y)` into
`ruby_xmalloc2(x, y)`.  Some convenient functions are also
provided for instance `rb_xmalloc_mul_add(x, y, z)` which allocates
x * y + z byes.
2019-10-09 12:12:28 +09:00
Lars Kanis 9311656914
Better wording for __ENCODING__
"locale encoding" is misleading since it doesn't mean Encoding.find("locale")
but the encoding used to interpret the script file. It's therefore better to
call it "script encoding" as in the paragraphs above.
Closes: https://github.com/ruby/ruby/pull/1611
2019-08-04 09:03:46 +09:00
Lourens Naudé 009ec37a47
Let the index boundary check in rb_enc_from_index be flagged as unlikely
[Misc #15806]

Closes: https://github.com/ruby/ruby/pull/2128
2019-07-23 16:45:54 +09:00
Lourens Naudé 6546aed475
Explicitly initialise encodings on init to remove branches on encoding lookup
[Misc #15806]

Closes: https://github.com/ruby/ruby/pull/2128
2019-07-23 16:45:54 +09:00