Граф коммитов

93 Коммитов

Автор SHA1 Сообщение Дата
duerst 81515b2381 * enc/unicode.c, test/ruby/enc/test_case_mapping.rb: Implemented :fold
option for String#downcase by using case folding data from
  regular expression engine, and added a few simple tests.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53747 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-06 05:37:29 +00:00
duerst b658249cef * enc/unicode.c: Activated :ascii flag for ASCII-only case conversion
(with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53740 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-02-04 12:05:23 +00:00
duerst a7c987968d * enc/unicode.c: Fixed bit mask in macro OnigCodePointCount
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53670 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-27 09:54:38 +00:00
duerst 415949faba * enc/unicode.c: Protect code point count by macro, in order to
be able to use the remaining bits for flags.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53669 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-27 08:55:40 +00:00
duerst f307d1fe21 * enc/unicode.c: Fixed a logical error and some comments.
* test/ruby/enc/test_case_mapping.rb: Made tests more general.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53564 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-17 11:10:45 +00:00
nobu 39f44f0113 get rid of non-ascii chars
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53563 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-17 09:03:11 +00:00
duerst 959bbb6f72 * enc/unicode.c: Removed artificial expansion for Turkic,
added hand-coded support for Turkic, fixed logic for swapcase.
* string.c: Made use of new case mapping code possible from upcase,
  capitalize, and swapcase (with :lithuanian as a guard).
* test/ruby/enc/test_case_mapping.rb: Adjusted for above.
  (with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53562 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-17 08:42:16 +00:00
duerst c12af76763 * enc/unicode.c: Artificial mapping to test buffer expansion code.
* string.c: Fixed buffer expansion logic.
* test/ruby/enc/test_case_mapping.rb: Tests for above.
(with Kimihito Matsui)


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53554 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-16 08:24:58 +00:00
hsbt 219467abde * enc/unicode.c: fix implicit conversion error with clang. fixup r53548.
* string.c: ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53552 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-16 01:51:58 +00:00
duerst be897c2507 * string.c, enc/unicode.c: New code path as a preparation for Unicode-wide
case mapping. The code path is currently guarded by the :lithuanian
  option to avoid accidental problems in daily use.
* test/ruby/enc/test_case_mapping.rb: Test for above.
* string.c: function 'check_case_options': fixed logical errors

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53548 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2016-01-16 01:24:03 +00:00
naruse 9ed1d63f41 * regcomp.c, regenc.c, regexec.c, regint.h, enc/unicode.c:
Merge Onigmo 58fa099ed1a34367de67fb3d06dd48d076839692
  + https://github.com/k-takata/Onigmo/pull/52

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52756 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-11-26 08:31:27 +00:00
nobu c56dd7fd05 unicode.c: no st.h
* enc/unicode.c: no longer use st.h.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51711 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-08-27 19:50:23 +00:00
nobu 8b9ad9d6f0 oniguruma.h: constify
* include/ruby/oniguruma.h (OnigEncodingTypeST): constify
  property_name_to_ctype arguments.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51710 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-08-27 19:49:45 +00:00
naruse d2a5354255 * reg*.c: Merge Onigmo 5.15.0 38a870960aa7370051a3544
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47598 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-09-15 16:18:41 +00:00
nobu 046831094b constify rb_encoding and OnigEncoding
* include/ruby/encoding.h: constify `rb_encoding` arguments.
* include/ruby/oniguruma.h: constify `OnigEncoding` arguments.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46309 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-06-01 22:06:11 +00:00
nobu dd20f90408 unicode.c: no initialization
* enc/unicode.c (init_case_fold_table): no longer need to
  initialize tables at runtime.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46273 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-05-30 23:58:34 +00:00
nobu 7e67b39679 case-folding.rb: perfect hash for case unfolding3
* enc/unicode/case-folding.rb (lookup_hash): make perfect hash to
  lookup case unfolding table 3.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46272 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-05-30 23:58:24 +00:00
nobu 8f59867651 case-folding.rb: perfect hash for case unfolding2
* enc/unicode/case-folding.rb (lookup_hash): make perfect hash to
  lookup case unfolding table 2.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46271 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-05-30 23:58:14 +00:00
nobu 35348a0806 case-folding.rb: perfect hash for case unfolding1
* enc/unicode/case-folding.rb (lookup_hash): make perfect hash to
  lookup case unfolding table 1.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46270 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-05-30 23:58:01 +00:00
nobu c39e659263 case-folding.rb: perfect hash for case folding
* enc/unicode/case-folding.rb (lookup_hash): make perfect hash to
  lookup case folding table.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46269 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-05-30 23:57:45 +00:00
nobu 88eae35862 case-folding.rb: merge tables
* enc/unicode/case-folding.rb (print_table): merge non-locale and
  locale tables, and reduce initializing loops.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46268 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-05-30 23:56:00 +00:00
nobu 90fb7538f8 enc/unicode.c: lookup functions
* enc/unicode.c (onigenc_unicode_{fold,unfold{1,2,3}}_lookup):
  abstract lookup functions.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46057 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-05-23 04:44:15 +00:00
nobu dcaf699ee9 enc/unicode.c: constify
* enc/unicode.c (code{2,3}_{cmp,hash}): constify and adjust
  argument types.

* enc/unicode.c (onigenc_unicode_fold_lookup): constify.

* enc/unicode.c (onigenc_unicode_apply_all_case_fold): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46056 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-05-23 04:37:02 +00:00
naruse 291fa223cf * regparse.c (is_onechar_cclass): optimize character class
Merge Onigmo 27278c12e6674043cc8affca6507e20e119a86ee.

* regparse.c (is_onechar_cclass): [bug] unexpected match occurs when a
  char class contains no char

* enc/unicode.c (init_case_fold_table): define the sizes of case
  folding tables in casefold.h

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@34860 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-02-29 16:29:06 +00:00
naruse 0424e152c6 * Merge Onigmo-5.13.1. [ruby-dev:45057] [Feature #5820]
https://github.com/k-takata/Onigmo
  cp reg{comp,enc,error,exec,parse,syntax}.c reg{enc,int,parse}.h
  cp oniguruma.h
  cp tool/enc-unicode.rb
  cp -r enc/

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@34663 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-02-17 07:42:23 +00:00
naruse be276c140d * enc/unicode.c (PROPERTY_NAME_MAX_SIZE): +1.
reported by Ken Takata. [ruby-dev:44894][Bug #5652]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@33797 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2011-11-20 13:44:11 +00:00
nobu 3a47cf3395 * remove trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@31573 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2011-05-15 11:55:52 +00:00
naruse e1d5d4e7f2 * enc/unicode.c (onigenc_unicode_property_name_to_ctype):
remove useless assignment.

* vm.c (vm_make_proc_from_block): ditto.

* variable.c (rb_ivar_count): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29405 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-10-03 22:57:23 +00:00
matz db37773e13 * include/ruby/oniguruma.h: updated to follow Oniguruma 5.9.2.
* re.c (make_regexp): use onig_new() instead of onig_alloc_init().

* re.c (rb_reg_to_s): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26791 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-03-01 21:54:59 +00:00
naruse ee4b59a419 * unicode.c (onigenc_unicode_property_name_to_ctype):
ignore case of properties.

* tool/enc-unicode.rb: downcase properties list.

* enc/unicode/name2ctype.h, enc/unicode/name2ctype.h.blt,
  enc/unicode/name2ctype.kwd, enc/unicode/name2ctype.src:
  follow above.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24836 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-09-10 22:54:01 +00:00
nobu 31b7ae00c0 * include/ruby/st.h (st_hash_func): use st_index_t.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24792 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-09-08 13:10:04 +00:00
naruse 6e6a28183a * unicode.c (PROPERTY_NAME_MAX_SIZE): use MAX_WORD_LENGTH.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24677 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-08-26 17:01:10 +00:00
nobu 1af43ae867 * enc/unicode.c (onigenc_unicode_mbc_case_fold): balanced braces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24658 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-08-26 00:48:49 +00:00
naruse f1eff95745 Update Oniguruma's UnicodeData to 5.1.
* tool/enc-unicode.rb: added for generate name2ctype.kwd.
  contributed by Run Paint Run Run [ruby-core:24775]
  use like following:
    ruby19 tool/enc-unicode.rb enc/unicode/UnicodeData.txt \
      enc/unicode/Scripts.txt > enc/unicode/name2ctype.kwd

* enc/unicode.c (CodeRanges): move definitions to name2ctype.h.

* enc/unicode/name2ctype.h.blt, enc/unicode/name2ctype.kwd,
  enc/unicode/name2ctype.src: updated to v5.1.

* enc/unicode/UnicodeData.txt, enc/unicode/Scripts.txt: added v5.1.

* Makefile.in: add rule to generate name2ctype.kwd from
  UnicodeData.txt and Scripts.txt.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24651 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-08-25 16:15:38 +00:00
nobu a7b920686a * enc/unicode/name2ctype.h: split from enc/unicode.c and made a
perfect hash.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24613 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-08-21 08:01:09 +00:00
nobu e1c9ac6bd9 * enc/unicode.c (CodeRanges): initialized statically.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24582 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-08-19 02:32:49 +00:00
akr 081c802cb9 * grapheme cluster implementation reverted. [ruby-dev:36375]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19417 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-18 12:53:25 +00:00
akr a67d4fa01c * include/ruby/oniguruma.h (OnigEncodingTypeST): add precise_ret
argument for mbc_to_code.
  (ONIGENC_MBC_TO_CODE): provide NULL for precise_ret.
  (ONIGENC_MBC_PRECISE_CODEPOINT): defined.

* include/ruby/encoding.h (rb_enc_mbc_precise_codepoint): defined.

* regenc.h (onigenc_single_byte_mbc_to_code): precise_ret argument
  added.
  (onigenc_mbn_mbc_to_code): ditto.

* regenc.c (onigenc_single_byte_mbc_to_code): precise_ret argument
  added.
  (onigenc_mbn_mbc_to_code): ditto.

* string.c (count_utf8_lead_bytes_with_word): removed.
  (str_utf8_nth): removed.
  (str_utf8_offset): removed.
  (str_strlen): UTF-8 codepoint oriented optimization removed.
  (rb_str_substr): ditto.
  (enc_succ_char): use rb_enc_mbc_precise_codepoint.
  (enc_pred_char): ditto.
  (rb_str_succ): ditto.

* encoding.c (rb_enc_ascget): check length with
  rb_enc_mbc_precise_codepoint.
  (rb_enc_codepoint): use rb_enc_mbc_precise_codepoint.

* regexec.c (string_cmp_ic): add text_end argument.
  (match_at): check end of character after exact string matches.

* enc/utf_8.c (graphme_table): defined for extended graphme cluster
  boundary.
  (grapheme_cmp): defined.
  (get_grapheme_properties): defined.
  (grapheme_boundary_p): defined.
  (MAX_BYTES_LENGTH): defined.
  (comb_char_enc_len): defined.
  (mbc_to_code0): extracted from mbc_to_code.
  (mbc_to_code): use mbc_to_code0.
  (left_adjust_combchar_head): defined.
  (utf_8): use a extended graphme cluster as a unit.

* enc/unicode.c (onigenc_unicode_mbc_case_fold): use
  ONIGENC_MBC_PRECISE_CODEPOINT to extract codepoints.
  (onigenc_unicode_get_case_fold_codes_by_str): ditto.

* enc/euc_jp.c (mbc_to_code): follow mbc_to_code field change.
  use onigenc_mbn_mbc_to_code.

* enc/shift_jis.c (mbc_to_code): ditto.

* enc/emacs_mule.c (mbc_to_code): ditto.

* enc/gbk.c (gbk_mbc_to_code): follow mbc_to_code field and
  onigenc_mbn_mbc_to_code change.

* enc/cp949.c (cp949_mbc_to_code): ditto.

* enc/big5.c (big5_mbc_to_code): ditto.

* enc/euc_tw.c (euctw_mbc_to_code): ditto.

* enc/euc_kr.c (euckr_mbc_to_code): ditto.

* enc/gb18030.c (gb18030_mbc_to_code): ditto.

* enc/utf_32be.c (utf32be_mbc_to_code): follow mbc_to_code field
  change.

* enc/utf_16be.c (utf16be_mbc_to_code): ditto.

* enc/utf_32le.c (utf32le_mbc_to_code): ditto.

* enc/utf_16le.c (utf16le_mbc_to_code): ditto.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19389 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-16 16:48:05 +00:00
mame d7baa40cac * enc/euc_jp.c (property_name_to_ctype): core dumped when sizeof(int)
differs from sizeof(long).

* enc/shift_jis.c (property_name_to_ctype): ditto.

* enc/unicode.c (onigenc_unicode_property_name_to_ctype): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@17381 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-06-17 13:06:34 +00:00
naruse 6e1c3a0f54 * enc/koi8_u.c: added.
* regenc.c, enc/utf_8.c, enc/unicode.c, enc/gb18030.c: add ARG_UNUSED.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15130 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-19 15:37:06 +00:00
nobu dca4de6838 * regenc.c (onigenc_strlen_null, onigenc_str_bytelen_null): suppressed
warnings.

* regenc.h, enc/unicode.c (onigenc_unicode_ctype_code_range): added
  encoding argument.

* enc/utf{16,32}_{be,le}.c: added init functions.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14946 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-08 06:40:33 +00:00
matz 52ed8c4edd * include/ruby/oniguruma.h: Oniguruma 1.9.1 merged.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14874 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-03 15:55:04 +00:00
nobu 72483cdcca * Makefile.in, */Makefile.sub (VPATH): add enc directory.
* common.mk (ENCOBJS): encoding objects.

* enc: directory for encodings.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13675 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-10-10 21:35:45 +00:00