* lib/unicode_normalize/normalize.rb: Fix the range check for trailing
Hangul jamo characters in Unicode normalization. Different from
leading or vowel jamos, where LBASE and VBASE are actual characters,
a value equal to TBASE expresses the absence of a trailing jamo.
This fix is technically correct, but there was no bug because
the regular expressions in lib/unicode_normalize/tables.rb
eliminate jamos equal to TBASE from normalization processing.
* test/test_unicode_normalize.rb: Add preventive test
test_no_trailing_jamo based on
d134809cd3
just for the case we ever get a regression.
This closes issue #14934, thanks to MaLin (Lin Ma) for reporting.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
In lib/unicode_normalize/normalize.rb, add explanations and clarifications
about the status of the files and the module. [ci skip]
This is in response to discussions at https://github.com/ruby/spec/pull/433
and https://bugs.ruby-lang.org/issues/5481#note-58.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58617 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalized?
(including documentation). Leave a comment explaining that the file is now empty.
* string.c: Define String#unicode_normalized? in rb_str_unicode_normalized_p in C,
(including documentation)
* lib/unicode_normalize/normalize.rb: Remove (re)definition of
String#unicode_normalized? to avoid warnings (when $VERBOSE==true) and
problems when String is frozen
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58555 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalize!
(including documentation)
* string.c: Define String#unicode_normalize! in rb_str_unicode_normalize_bang in C,
(including documentation)
* lib/unicode_normalize/normalize.rb: Remove (re)definition of
String#unicode_normalize! to avoid warnings (when $VERBOSE==true) and
problems when String is frozen
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58553 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* lib/unicode_normalize.rb: Remove definition of String#unicode_normalize
(including documentation)
* string.c: Define String#unicode_normalize in rb_str_unicode_normalize in C,
(including documentation)
* lib/unicode_normalize/normalize.rb: Remove (re)definition of
String#unicode_normalize to avoid warnings (when $VERBOSE==true) and
problems when String is frozen
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58550 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
simplify String#unicode_normalize! and #unicode_normalized?
in lib/unicode_normalize.rb by redefining them
in lib/unicode_normalize/normalize.rb
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58538 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
simplify String#unicode_normalize in lib/unicode_normalize.rb
by redefining it in lib/unicode_normalize/normalize.rb
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58537 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
When you change this to true, you may need to add more tests.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53141 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* lib/unicode_normalize/normalize.rb (UnicodeNormalize): REGEXP_K
matches only single chars which are keys of KOMPATIBLE_TABLE, so
string in nfkd_one is always single char and one of the key of
KOMPATIBLE_TABLE, that is that the default proc of NF_HASH_K only
copies a pair in KOMPATIBLE_TABLE. therefore NF_HASH_K is a
part of KOMPATIBLE_TABLE always, and just redundant.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49929 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* template/unicode_norm_gen.tmpl: expand kompatible_table so that
recursive expansion is not needed at runtime.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48311 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* lib/unicode_normalize/normalize.rb (canonical_ordering_one):
use explicit separator, not to depend on $,.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48308 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
as trivially supported encoding (is always normalized,
and may appear mixed in with UTF-8 or other Unicode
encodings).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48134 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
relevant portion of Unicode standard for Hangul (de)composition
identifiers and algorithm.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48071 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* lib/unicode_normalize/normalize.rb (hangul_decomp_one): use more
descriptive name. leave [SLVT]BASE and [LVTNS]COUNT as they are
vague names.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48055 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
is not hungarian notation. The variable name sIndex is
directly taken from the relevant part of the Unicode
Standard, where it is written SIndex and stands for
'syllable index'. See pp. 144/145 of
http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48052 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* lib/unicode_normalize/normalize.rb (NF_HASH_{D,C,K}): remove
first element by Hash#shift.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48045 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* lib/unicode_normalize/normalize.rb (UnicodeNormalize): use self
instead of module name and remove module name if unnecessary.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
File name change from lib/unicode_normalize/normalize_tables.rb
to lib/unicode_normalize/tables.rb.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48015 b2dd03c8-39d4-4d8f-98ff-823fe69b080e