case mapping methods.
* enc/unicode.c: Check for invalid string and signal with negative
length value.
* test/ruby/enc/test_case_mapping.rb: Add tests for above.
* test/ruby/test_m17n_comb.rb: Add a message to clarify test failure.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55253 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
swapcase functionality for titlecase characters. Swapcase isn't defined
by Unicode, because the purpose/usage of swapcase is unclear anyway.
The implementation follows a proposal from Nobu, swaping the case of
each component of a titlecase character individually.
This means that the titlecase characters have to be decomposed.
* enc/unicode.c: Code using the above data.
* test/ruby/enc/test_case_mapping.rb: Tests for the above.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54469 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
special cases in CaseUnfold_11_Table.
* enc/unicode.c: Adjustments for above.
* test/ruby/enc/test_case_mapping.rb: Tests for the above: Some tests in
test_titlecase activated; test_greek added. A test in test_cherokee fixed.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54383 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode/case-folding.rb, casefold.h: Using above flag in data.
* enc/unicode.c: Marking capitalized character as unmodified if it is
already titlecase.
* test/ruby/enc/test_case_mapping.rb: Tests for above functionality.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54229 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* test/ruby/enc/test_case_mapping.rb: Test cases that detected
the above bugs.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54140 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
and macros to work with unified CaseMappingSpecials array.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54101 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
case mapping data not available from case folding by unifying all
three cases (special title, special upper, special lower).
* enc/unicode.c: Adjust macro names for above (macros are currently inactive).
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@54085 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
space for titlecase indices; adding additional macros to add or
extract titlecase index; adding comments for better documentation.
* enc/unicode.c: Moving some macros to include/ruby/oniguruma.h;
activating use of titlecase indices.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53915 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
data (new table, with indices from other tables).
* enc/unicode.c: Ignoring titlecase data indices for the moment.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53906 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
(rather than all) of target in CaseUnfold_11 array.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53843 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
single-letter; use flags in casefold.h for logic.
* enc/unicode/case-folding.rb: Added flag for case folding.
Changed parameter passing.
* enc/unicode/casefold.h: New flags added.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53775 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode.c: Added shortening macros for enc/unicode/casefold.h
* enc/unicode/case-folding.rb: Fixed file encoding for CaseFolding.txt
to ASCII-8BIT (should fix some ci errors). Clarified usage. Created
class MapItem. Partially implemented class CaseMapping.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53767 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
to pass as parameters; not yet implemented or used.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53764 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
String#downcase :fold.
* enc/unicode.c: Fixed a range error (lowest non-ASCII character affected
by case operations is U+00B5, MICRO SIGN)
* test/ruby/enc/test_case_mapping.rb: Explicit test for case folding of
MICRO SIGN to Greek mu.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53749 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
option for String#downcase by using case folding data from
regular expression engine, and added a few simple tests.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53747 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/depend: make timestamps for each work directory, instead of
making for each compilation and link.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53714 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
be able to use the remaining bits for flags.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53669 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
added hand-coded support for Turkic, fixed logic for swapcase.
* string.c: Made use of new case mapping code possible from upcase,
capitalize, and swapcase (with :lithuanian as a guard).
* test/ruby/enc/test_case_mapping.rb: Adjusted for above.
(with Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53562 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
case mapping. The code path is currently guarded by the :lithuanian
option to avoid accidental problems in daily use.
* test/ruby/enc/test_case_mapping.rb: Test for above.
* string.c: function 'check_case_options': fixed logical errors
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53548 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/Makefile.in (ECHO1): expand NULLCMD by configured value to
get rid of a bug of nmake, that it can expand bare single name
variable but cannot in substition.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53437 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/depend (enc, trans): fix version dependency, let encoding
and transcoding shared object files depend on config.status,
instead of enc.mk which is regenerated at each build, for the
RUBY_SO_NAME value used at runtime link.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53325 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/depend (enc, trans): fix version dependency, shared object
files depend on the RUBY_SO_NAME value for runtime link.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53324 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
from ISO-8859-2 to fix 0x80..0x9e range (from Kimihito Matsui)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53198 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
test/ruby/test_transcode.rb: Fixed encoding name
to the correct one in the IANA registry (IBM037)
and added an alias (ebcdic-cp-us)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53124 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/trans/ebcdic.trans: transcodings between EBCDIC-US
and iso-8859-1 [with code from Andrea Ribuoli]
* test/ruby/test_transcode.rb: tests for above
* tool/transcode_tablegen.rb: additional argument for
method transcode_tblgen
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53112 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/windows_1252.c: separate from ISO-8859-1 to fix 0x80..0x9e
range. [ruby-core:64049] [Bug #10097]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53046 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
regular expressions from 7.0.0 to 8.0.0
(with help from Kimihito Matsui) [Feature #11563]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52612 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/{ascii,us_ascii,utf_8}.c: set encoding indexes of
fundamental built-in encodings so that usable as well as
allocated rb_encoding before rb_enc_init().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51862 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/make_encmake.rb: @srcdir@ in enc/Makefile.in needs to be
expanded.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51770 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
not include encinit.c itself. It caused "undefined reference to
Init_encinit".
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@50978 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* Makefile.in (VPATH, NEWLINE_C), common.mk (common-srcs): make
and use newline.c under enc/trans directory, not toplevel. no
longer search enc directory implicitly.
* configure.in, enc/Makefile.in (BUILTIN_ENCS, BUILTIN_TRANSES):
prefix respective directory names to builtin encodings and
transcoder source names.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49317 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/prelude.rb: no longer need to load encdb and transdb here.
Init_enc should load them if possible.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48698 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/make_encmake.rb: fix typo, and use real read filename.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48621 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
for test), so should permit to run ruby if unicode_normalize.rb is
missing.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48060 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
the methods that are available on String are available without explicit require.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48023 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode.c (init_case_fold_table): no longer need to
initialize tables at runtime.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46273 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode/case-folding.rb (lookup_hash): make perfect hash to
lookup case unfolding table 3.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46272 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode/case-folding.rb (lookup_hash): make perfect hash to
lookup case unfolding table 2.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46271 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode/case-folding.rb (lookup_hash): make perfect hash to
lookup case unfolding table 1.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46270 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/unicode/case-folding.rb (lookup_hash): make perfect hash to
lookup case folding table.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46269 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/jis/props.kwd: constify character property tables of JIS
based encodings by perfect hash.
* enc/euc_jp.c, enc/shift_jis.c: use character property functions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46039 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* complex.c, rational.c: remove unused functions, which are warned
by clang 5.1, and also variables only used by removed functions.
* ext/date/date_core.c: ditto.
* enc/utf_16be.c, enc/utf_16le.c: comment out constants only used
by commented out functions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@45354 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/encdb.c, enc/utf_16_32.h (ENC_DUMMY_UNICODE): Unicode with BOM
must be based on big endian variants, so that actual encodings would
work. [ruby-core:57318] [Bug #8940]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@43023 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Previous table is used on Mac OS X 10.1 or prior.
This table is used on 10.2 or later. [ruby-dev:47680]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@42789 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
tool/transcode-tblgen.rb: change EUC-JP-2004 to EUC-JIS-2004.
This is follow up to changes in r41024.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@41035 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* ext/depend (ENCOBJS, TRANSOBJS): use explicit path to ruby.h for
nmake.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@40187 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* ext/depend (ENCOBJS, TRANSOBJS): fix header dependency, VPATH has
$(srcdir)/include/ruby but not $(srcdir)/include, so cannot find out
ruby/ruby.h. use ruby.h instead and ../ruby for include/ruby.h.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@40186 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/depend (ARFLAGS, RANLIB): these values can be nil. [Bug #7950]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@39490 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/depend (ARFLAGS): VisualC++ linker does not allow spaces between
output option and the output file name. [Bug #7950]
* enc/depend (RANLIB): set default command to do nothing, or make the
entire line a label on Windows.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@39489 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/depend: fix inplace-build condition. enc.mk is generated with
setting $srcdir to enc, but pwd is still top build direcory.
[ruby-core:47236] [Bug #6888]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36725 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tool/enc-unicode.rb: add comment why it uses Hash#index.
* enc/unicode/{name2ctype.kwd,name2ctype.src,name2ctype.h.blt}:
update to follow the current name2ctype.h.
FYI current Unicode version is 6.1.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36070 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
ONIGERR_INVALID_CODE_POINT_VALUE if the code is invalid.
* enc/shift_jis.c (tr_next): increment character until the code
is a valid character. [ruby-dev:45652] [Bug #6450]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@35724 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/encinit.c.erb: use %-lines to adjust indent in the generated file.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@35670 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Fixes --with-static-linked-ext.
Patch by Google Inc. [ruby-core:45073].
* Makefile.in (ENCOBJS, EXTOBJS): New variables to specify static
linked libraries. Also reintroduces extinit.o, introduces encinit.o
introduces encinit.o
* common.mk: Builds static libraries rather than shared objects if
specified.
* configure.in (LD): new substitution.
Avoids PIE if s
* enc/depend: Supports static linked libraries
(libencs, libenc, libtrans): New target.
* enc/encinit.c.erb: new template to generate the initialization of
statically linked encodings.
* enc/make_encmake.rb (--module): new flag to specify whether static
or dynamic.
* transcode_data.h (TRANS_INIT): New macro to get rid of the name
collision of encoding initializers and transcoder initializers.
* ext/extmk.rb: Fixes the behavior on $extstatic is true.
* lib/mkmf.rb (clean-static): new target to clean up static linked
libraries.
* ruby.c (process_options): New initializes statically linked
encodings here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@35662 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
fix 'array of strings' to 'array of symbols'
[ruby-core:44152][Bug #6264]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@35244 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Merge Onigmo 27278c12e6674043cc8affca6507e20e119a86ee.
* regparse.c (is_onechar_cclass): [bug] unexpected match occurs when a
char class contains no char
* enc/unicode.c (init_case_fold_table): define the sizes of case
folding tables in casefold.h
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@34860 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
rest_sweep() instead of it, because some dead objects might be
marked in next the mark phase by false pointers.
[ruby-core:42672]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@34719 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
is invalid. [Feature #5855] [Bug #5863] [Bug #5864]
* string.c (rb_str_concat): ditto.
* string.c (rb_str_concat): set encoding as ASCII-8BIT when the string
is US-ASCII and the argument is an integer greater than 127.
* regenc.c (onigenc_mb2_code_to_mbclen): rearrange error code.
* enc/euc_jp.c (code_to_mbclen): ditto.
* enc/shift_jis.c (code_to_mbclen): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@34236 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
--
* enc/Makefile.in (ECHO1): Same as the recent fix in common.mk.
":" in a make variable replacement cause a syntax error with
/usr/ccs/bin/make on Solaris. Uses $(NULLCMD) instead.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@32787 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
CP932 UDA. Another reason is emacs-mule: the implementation of
stateless-iso-2022-jp doesn't support beyond 94x94 (0x7fxx);
but CP932 UDA is in 7Fxx-92xx.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@31366 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
patched by oCameLo oTnTh [ruby-core:33256]
* enc/big5.c: add alias Big5-HKSCS:2008 to Big5-HKSCS.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29922 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* enc/trans/utf_16_32.trans: add a converter from UTF-16 to UTF-8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29889 b2dd03c8-39d4-4d8f-98ff-823fe69b080e