The default charset of text/* media type is UTF-8.

Thanks for the patch  gareth (Gareth Adams).  [Bug #15933]

-------

Combines two small, but very related changes

1: Treat HTTPS the same as HTTP

Previously, OpenURI followed guidance in RFC2616/3.7.1:

> When no explicit charset parameter is provided by the sender, media
> subtypes of the "text" type are defined to have a default charset
> value of "ISO-8859-1" when received via HTTP.

However this RFC was written before TLS was established and OpenURI was
never updated to treat HTTPS traffic the same way. So, HTTPS documents
received a different default to HTTP documents.

This commit removes the scheme check so that all text/* documents
processed by OpenURI are treated the same way.

In theory this processing gets applied to FTP URIs too, but there's no
mechanism in OpenURI for FTP documents to have Content-Type metadata
appended to them, so this ends up being a no-op.

2: Change default charset for text/* to UTF-8

Replaces the default ISO-8859-1 charset previously defined in RFC2616 (now
obsoleted) with a UTF-8 charset as defined in RFC6838.

Fixes: https://bugs.ruby-lang.org/issues/15933
This commit is contained in:
Tanaka Akira 2019-07-15 09:36:52 +09:00
Родитель 00a97d9412
Коммит 8f7884761e
3 изменённых файлов: 9 добавлений и 7 удалений

5
NEWS
Просмотреть файл

@ -212,7 +212,10 @@ Net::IMAP::
open-uri::
* Warn open-uri's "open" method at Kernel.
Use URI.open instead.
Use URI.open instead. [Misc #15893]
* The default charset of text/* media type is UTF-8 instead of ISO-8859-1.
[Bug #15933]
Pathname:

Просмотреть файл

@ -552,17 +552,16 @@ module OpenURI
# It can be used to guess charset.
#
# If charset parameter and block is not given,
# nil is returned except text type in HTTP.
# In that case, "iso-8859-1" is returned as defined by RFC2616 3.7.1.
# nil is returned except text type.
# In that case, "utf-8" is returned as defined by RFC6838 4.2.1
def charset
type, *parameters = content_type_parse
if pair = parameters.assoc('charset')
pair.last.downcase
elsif block_given?
yield
elsif type && %r{\Atext/} =~ type &&
@base_uri && /\Ahttp\z/i =~ @base_uri.scheme
"iso-8859-1" # RFC2616 3.7.1
elsif type && %r{\Atext/} =~ type
"utf-8" # RFC6838 4.2.1
else
nil
end

Просмотреть файл

@ -648,7 +648,7 @@ class TestOpenURI < Test::Unit::TestCase
URI.open("#{url}/nc/") {|f|
assert_equal("aa", f.read)
assert_equal("text/plain", f.content_type)
assert_equal("iso-8859-1", f.charset)
assert_equal("utf-8", f.charset)
assert_equal("unknown", f.charset { "unknown" })
}
}