* string.c (count_utf8_lead_bytes_with_ulong): fix shift size.

[ruby-dev:33993]

* string.c (str_utf8_nth) fix wrong counting.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15700 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit is contained in:
naruse 2008-03-05 19:34:15 +00:00
Родитель 579f16d985
Коммит 98cbcf1bd7
4 изменённых файлов: 28 добавлений и 14 удалений

Просмотреть файл

@ -1,3 +1,10 @@
Thu Mar 6 04:32:06 2008 NARUSE, Yui <naruse@ruby-lang.org>
* string.c (count_utf8_lead_bytes_with_ulong): fix shift size.
[ruby-dev:33993]
* string.c (str_utf8_nth) fix wrong counting.
Thu Mar 6 00:34:00 2008 Nobuyoshi Nakada <nobu@ruby-lang.org> Thu Mar 6 00:34:00 2008 Nobuyoshi Nakada <nobu@ruby-lang.org>
* sprintf.c (rb_str_format): size_t returned from strlen() can be * sprintf.c (rb_str_format): size_t returned from strlen() can be
@ -7,7 +14,7 @@ Thu Mar 6 00:31:39 2008 Nobuyoshi Nakada <nobu@ruby-lang.org>
* struct.c (make_struct): preserve encoding of struct name. * struct.c (make_struct): preserve encoding of struct name.
Wed Mar 05 22:49:20 2008 NARUSE, Yui <naruse@ruby-lang.org> Wed Mar 5 22:49:20 2008 NARUSE, Yui <naruse@ruby-lang.org>
* string.c (is_utf8_lead_byte, count_utf8_lead_bytes_with_ulong): * string.c (is_utf8_lead_byte, count_utf8_lead_bytes_with_ulong):
defined for UTF-8 optimization. defined for UTF-8 optimization.
@ -22,7 +29,7 @@ Wed Mar 5 17:53:01 2008 Nobuyoshi Nakada <nobu@ruby-lang.org>
* file.c (rb_file_flock): returns false on EAGAIN if non-blocking. * file.c (rb_file_flock): returns false on EAGAIN if non-blocking.
[ruby-core:15795] [ruby-core:15795]
Web Mar 5 17:43:43 2008 Martin Duerst <duerst@it.aoyama.ac.jp> Wed Mar 5 17:43:43 2008 Martin Duerst <duerst@it.aoyama.ac.jp>
* transcode.c (transcode_loop): Adjusted detection of invalid * transcode.c (transcode_loop): Adjusted detection of invalid
(ill-formed) UTF-8 sequences. Fixing potential security issue, see (ill-formed) UTF-8 sequences. Fixing potential security issue, see
@ -85,7 +92,7 @@ Tue Mar 4 16:29:06 2008 Yukihiro Matsumoto <matz@ruby-lang.org>
* parse.y (parser_yylex): disallow non digits '0o' expression. * parse.y (parser_yylex): disallow non digits '0o' expression.
Tue Mar 04 14:35:12 2008 NARUSE, Yui <naruse@ruby-lang.org> Tue Mar 4 14:35:12 2008 NARUSE, Yui <naruse@ruby-lang.org>
* io.c (open_key_args): use rb_io_open_with_args instead of rb_f_open. * io.c (open_key_args): use rb_io_open_with_args instead of rb_f_open.
[ruby-core:15763] [ruby-core:15763]
@ -99,7 +106,7 @@ Tue Mar 4 10:21:03 2008 Nobuyoshi Nakada <nobu@ruby-lang.org>
* gc.c (add_heap): use binary search to find the place to insert the * gc.c (add_heap): use binary search to find the place to insert the
new heap slot. [ruby-dev:33983] new heap slot. [ruby-dev:33983]
Tue Mar 04 05:30:31 2008 NARUSE, Yui <naruse@ruby-lang.org> Tue Mar 4 05:30:31 2008 NARUSE, Yui <naruse@ruby-lang.org>
* io.c (open_key_args): use rb_io_open instead of rb_f_open. * io.c (open_key_args): use rb_io_open instead of rb_f_open.
[ruby-core:15746] [ruby-core:15746]
@ -189,7 +196,7 @@ Sat Mar 1 12:15:42 2008 Yukihiro Matsumoto <matz@ruby-lang.org>
* thread.c (remove_event_hook): should not access freed memory. * thread.c (remove_event_hook): should not access freed memory.
[ruby-dev:31820] [ruby-dev:31820]
Sat Mar 01 10:31:19 2008 NARUSE, Yui <naruse@ruby-lang.org> Sat Mar 1 10:31:19 2008 NARUSE, Yui <naruse@ruby-lang.org>
* io.c (read_all, rb_io_getline_fast): encoding is io_input_encoding. * io.c (read_all, rb_io_getline_fast): encoding is io_input_encoding.

Просмотреть файл

@ -763,7 +763,7 @@ count_utf8_lead_bytes_with_ulong(const unsigned long *s)
unsigned long d = *s; unsigned long d = *s;
d |= ~(d>>1); d |= ~(d>>1);
d >>= 6; d >>= 6;
d &= NONASCII_MASK >> 3; d &= NONASCII_MASK >> 7;
d += (d>>8); d += (d>>8);
d += (d>>16); d += (d>>16);
#if NONASCII_MASK == 0x8080808080808080UL #if NONASCII_MASK == 0x8080808080808080UL
@ -1177,11 +1177,10 @@ str_utf8_nth(const char *p, const char *e, int nth)
if (is_utf8_lead_byte(*p)) nth--; if (is_utf8_lead_byte(*p)) nth--;
p++; p++;
} }
while (s < t) { do {
nth -= count_utf8_lead_bytes_with_ulong(s); nth -= count_utf8_lead_bytes_with_ulong(s);
if (nth < sizeof(long)) break;
s++; s++;
} } while (s < t && sizeof(long) <= nth);
p = (char *)s; p = (char *)s;
} }
if (0 < nth) { if (0 < nth) {

Просмотреть файл

@ -810,6 +810,17 @@ class TestM17N < Test::Unit::TestCase
assert_equal(false, str[0..-1].ascii_only?) assert_equal(false, str[0..-1].ascii_only?)
end end
def test_utf8str_aref
s = "abcdefghijklmnopqrstuvwxyz\u{3042 3044 3046 3048 304A}"
assert_equal("a", s[0])
assert_equal("h", s[7])
assert_equal("i", s[8])
assert_equal("j", s[9])
assert_equal("\u{3044}", s[27])
assert_equal("\u{3046}", s[28])
assert_equal("\u{3048}", s[29])
end
def test_str_aref_len def test_str_aref_len
assert_equal(a("\xa1"), a("\xc2\xa1\xc2\xa2\xc2\xa3")[1, 1]) assert_equal(a("\xa1"), a("\xc2\xa1\xc2\xa2\xc2\xa3")[1, 1])
assert_equal(a("\xa1\xc2"), a("\xc2\xa1\xc2\xa2\xc2\xa3")[1, 2]) assert_equal(a("\xa1\xc2"), a("\xc2\xa1\xc2\xa2\xc2\xa3")[1, 2])

Просмотреть файл

@ -123,6 +123,8 @@ class TestRegexp < Test::Unit::TestCase
r = /./ r = /./
m = r.match("a") m = r.match("a")
assert_equal(r, m.regexp) assert_equal(r, m.regexp)
re = /foo/
assert_equal(re, re.match("foo").regexp)
end end
def test_source def test_source
@ -188,11 +190,6 @@ class TestRegexp < Test::Unit::TestCase
assert_equal(/foo/, m.dup.regexp) assert_equal(/foo/, m.dup.regexp)
end end
def test_match_regexp
re = /foo/
assert_equal(re, re.match("foo").regexp)
end
def test_match_size def test_match_size
m = /(.)(.)(\d+)(\d)/.match("THX1138.") m = /(.)(.)(\d+)(\d)/.match("THX1138.")
assert_equal(5, m.size) assert_equal(5, m.size)