ruby/lib/rdoc/encoding.rb

# coding: US-ASCII
# frozen_string_literal: true

##
# This class is a wrapper around File IO and Encoding that helps RDoc load
# files and convert them to the correct encoding.

module RDoc::Encoding

  HEADER_REGEXP = /^
    (?:
      \A\#!.*\n
      |
      ^\#\s+frozen[-_]string[-_]literal[=:].+\n
      |
      ^\#[^\n]+\b(?:en)?coding[=:]\s*(?<name>[^\s;]+).*\n
      |
      <\?xml[^?]*encoding=(?<quote>["'])(?<name>.*?)\k<quote>.*\n
    )+
  /xi # :nodoc:

  ##
  # Reads the contents of +filename+ and handles any encoding directives in
  # the file.
  #
  # The content will be converted to the +encoding+.  If the file cannot be
  # converted a warning will be printed and nil will be returned.
  #
  # If +force_transcode+ is true the document will be transcoded and any
  # unknown character in the target encoding will be replaced with '?'

  def self.read_file filename, encoding, force_transcode = false
    content = File.open filename, "rb" do |f| f.read end
    content.gsub!("\r\n", "\n") if RUBY_PLATFORM =~ /mswin|mingw/

    utf8 = content.sub!(/\A\xef\xbb\xbf/, '')

    enc = RDoc::Encoding.detect_encoding content
    content = RDoc::Encoding.change_encoding content, enc if enc

    begin
      encoding ||= Encoding.default_external
      orig_encoding = content.encoding

      if not orig_encoding.ascii_compatible? then
        content = content.encode encoding
      elsif utf8 then
        content = RDoc::Encoding.change_encoding content, Encoding::UTF_8
        content = content.encode encoding
      else
        # assume the content is in our output encoding
        content = RDoc::Encoding.change_encoding content, encoding
      end

      unless content.valid_encoding? then
        # revert and try to transcode
        content = RDoc::Encoding.change_encoding content, orig_encoding
        content = content.encode encoding
      end

      unless content.valid_encoding? then
        warn "unable to convert #{filename} to #{encoding}, skipping"
        content = nil
      end
    rescue Encoding::InvalidByteSequenceError,
           Encoding::UndefinedConversionError => e
      if force_transcode then
        content = RDoc::Encoding.change_encoding content, orig_encoding
        content = content.encode(encoding,
                                 :invalid => :replace,
                                 :undef => :replace,
                                 :replace => '?')
        return content
      else
        warn "unable to convert #{e.message} for #{filename}, skipping"
        return nil
      end
    end

    content
  rescue ArgumentError => e
    raise unless e.message =~ /unknown encoding name - (.*)/
    warn "unknown encoding name \"#{$1}\" for #{filename}, skipping"
    nil
  rescue Errno::EISDIR, Errno::ENOENT
    nil
  end

  def self.remove_frozen_string_literal string
    string =~ /\A(?:#!.*\n)?(.*\n)/
    first_line = $1

    if first_line =~ /\A# +frozen[-_]string[-_]literal[=:].+$/i
      string = string.sub first_line, ''
    end

    string
  end

  ##
  # Detects the encoding of +string+ based on the magic comment

  def self.detect_encoding string
    result = HEADER_REGEXP.match string
    name = result && result[:name]

    name ? Encoding.find(name) : nil
  end

  ##
  # Removes magic comments and shebang

  def self.remove_magic_comment string
    string.sub HEADER_REGEXP do |s|
      s.gsub(/[^\n]/, '')
    end
  end

  ##
  # Changes encoding based on +encoding+ without converting and returns new
  # string

  def self.change_encoding text, encoding
    if text.kind_of? RDoc::Comment
      text.encode! encoding
    else
      # TODO: Remove this condition after Ruby 2.2 EOL
      if RUBY_VERSION < '2.3.0'
        text.force_encoding encoding
      else
        String.new text, encoding: encoding
      end
    end
  end

end
* lib/rdoc: Update to RDoc 3.9.3. Fixes RDoc with `ruby -Ku`. Allows HTTPS image paths to be turned into <img> tags. Prevents special markup inside <tt> from being processed. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@33043 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2011-08-24 03:53:49 +04:00			`# coding: US-ASCII`
Merge rdoc-6.0.0.beta4 from upstream. It version applied `frozen_string_literal: true` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2017-11-27 13:45:24 +03:00			`# frozen_string_literal: true`
* lib/rdoc: Update to RDoc 3.9.3. Fixes RDoc with `ruby -Ku`. Allows HTTPS image paths to be turned into <img> tags. Prevents special markup inside <tt> from being processed. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@33043 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2011-08-24 03:53:49 +04:00
Import RDoc 3 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30249 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2010-12-20 06:22:49 +03:00			`##`
			`# This class is a wrapper around File IO and Encoding that helps RDoc load`
			`# files and convert them to the correct encoding.`

			`module RDoc::Encoding`

Merge RDoc 6.0.3 from upstream. It fixed the several bugs that was found after RDoc 6 releasing. From: SHIBATA Hiroshi <hsbt@ruby-lang.org> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2018-03-26 08:56:26 +03:00			`HEADER_REGEXP = /^`
			`(?:`
			`\A\#!.*\n`
			`\|`
			`^\#\s+frozen[-_]string[-_]literal[=:].+\n`
			`\|`
			`^\#[^\n]+\b(?:en)?coding[=:]\s(?<name>[^\s;]+).\n`
			`\|`
			`<\?xml[^?]encoding=(?<quote>["'])(?<name>.?)\k<quote>.*\n`
			`)+`
			`/xi # :nodoc:`

Import RDoc 3 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30249 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2010-12-20 06:22:49 +03:00			`##`
			`# Reads the contents of +filename+ and handles any encoding directives in`
			`# the file.`
			`#`
			`# The content will be converted to the +encoding+. If the file cannot be`
			`# converted a warning will be printed and nil will be returned.`
Upgrade to RDoc 3.5.3. Fixes [Bug #4376] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30815 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2011-02-07 10:07:12 +03:00			`#`
			`# If +force_transcode+ is true the document will be transcoded and any`
			`# unknown character in the target encoding will be replaced with '?'`
Import RDoc 3 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30249 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2010-12-20 06:22:49 +03:00
Upgrade to RDoc 3.5.3. Fixes [Bug #4376] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30815 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2011-02-07 10:07:12 +03:00			`def self.read_file filename, encoding, force_transcode = false`
Merge RDoc 6.0.3 from upstream. It fixed the several bugs that was found after RDoc 6 releasing. From: SHIBATA Hiroshi <hsbt@ruby-lang.org> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2018-03-26 08:56:26 +03:00			`content = File.open filename, "rb" do \|f\| f.read end`
* lib/rdoc/encoding.rb (RDoc::Encoding.read_file): fixup newline chars on Windows. see https://github.com/rdoc/rdoc/issues/87 * test/rdoc/test_rdoc_markup_pre_process.rb (TestRDocMarkupPreProcess#test_include_file, TestRDocMarkupPreProcess#test_include_file_encoding_incompatible): follow above change. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@33902 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2011-11-30 04:13:02 +04:00			`content.gsub!("\r\n", "\n") if RUBY_PLATFORM =~ /mswin\|mingw/`
Import RDoc 3 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30249 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2010-12-20 06:22:49 +03:00
			`utf8 = content.sub!(/\A\xef\xbb\xbf/, '')`

Merge RDoc 6.0.3 from upstream. It fixed the several bugs that was found after RDoc 6 releasing. From: SHIBATA Hiroshi <hsbt@ruby-lang.org> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2018-03-26 08:56:26 +03:00			`enc = RDoc::Encoding.detect_encoding content`
			`content = RDoc::Encoding.change_encoding content, enc if enc`
Import RDoc 3 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30249 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2010-12-20 06:22:49 +03:00
* lib/rdoc/, test/rdoc/: Update rdoc-5.0.0.beta2 Fixed ri parse defect with left-hand matched classes. https://github.com/rdoc/rdoc/pull/420 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56097 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2016-09-08 01:23:38 +03:00			`begin`
			`encoding \|\|= Encoding.default_external`
			`orig_encoding = content.encoding`

			`if not orig_encoding.ascii_compatible? then`
Merge rdoc-6.0.0.beta4 from upstream. It version applied `frozen_string_literal: true` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2017-11-27 13:45:24 +03:00			`content = content.encode encoding`
* lib/rdoc/, test/rdoc/: Update rdoc-5.0.0.beta2 Fixed ri parse defect with left-hand matched classes. https://github.com/rdoc/rdoc/pull/420 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56097 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2016-09-08 01:23:38 +03:00			`elsif utf8 then`
Merge rdoc-6.0.0.beta4 from upstream. It version applied `frozen_string_literal: true` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2017-11-27 13:45:24 +03:00			`content = RDoc::Encoding.change_encoding content, Encoding::UTF_8`
			`content = content.encode encoding`
* lib/rdoc/, test/rdoc/: Update rdoc-5.0.0.beta2 Fixed ri parse defect with left-hand matched classes. https://github.com/rdoc/rdoc/pull/420 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56097 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2016-09-08 01:23:38 +03:00			`else`
			`# assume the content is in our output encoding`
Merge rdoc-6.0.0.beta4 from upstream. It version applied `frozen_string_literal: true` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2017-11-27 13:45:24 +03:00			`content = RDoc::Encoding.change_encoding content, encoding`
* lib/rdoc/, test/rdoc/: Update rdoc-5.0.0.beta2 Fixed ri parse defect with left-hand matched classes. https://github.com/rdoc/rdoc/pull/420 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56097 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2016-09-08 01:23:38 +03:00			`end`

			`unless content.valid_encoding? then`
			`# revert and try to transcode`
Merge rdoc-6.0.0.beta4 from upstream. It version applied `frozen_string_literal: true` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2017-11-27 13:45:24 +03:00			`content = RDoc::Encoding.change_encoding content, orig_encoding`
			`content = content.encode encoding`
* lib/rdoc/, test/rdoc/: Update rdoc-5.0.0.beta2 Fixed ri parse defect with left-hand matched classes. https://github.com/rdoc/rdoc/pull/420 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56097 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2016-09-08 01:23:38 +03:00			`end`

			`unless content.valid_encoding? then`
			`warn "unable to convert #{filename} to #{encoding}, skipping"`
			`content = nil`
			`end`
			`rescue Encoding::InvalidByteSequenceError,`
			`Encoding::UndefinedConversionError => e`
			`if force_transcode then`
Merge rdoc-6.0.0.beta4 from upstream. It version applied `frozen_string_literal: true` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2017-11-27 13:45:24 +03:00			`content = RDoc::Encoding.change_encoding content, orig_encoding`
			`content = content.encode(encoding,`
			`:invalid => :replace,`
			`:undef => :replace,`
			`:replace => '?')`
* lib/rdoc/, test/rdoc/: Update rdoc-5.0.0.beta2 Fixed ri parse defect with left-hand matched classes. https://github.com/rdoc/rdoc/pull/420 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56097 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2016-09-08 01:23:38 +03:00			`return content`
			`else`
			`warn "unable to convert #{e.message} for #{filename}, skipping"`
			`return nil`
Import RDoc 3 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30249 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2010-12-20 06:22:49 +03:00			`end`
			`end`

			`content`
			`rescue ArgumentError => e`
			`raise unless e.message =~ /unknown encoding name - (.*)/`
			`warn "unknown encoding name \"#{$1}\" for #{filename}, skipping"`
			`nil`
			`rescue Errno::EISDIR, Errno::ENOENT`
			`nil`
			`end`

* lib/rdoc/, test/rdoc/: Update rdoc/rdoc master(f191513) https://github.com/rdoc/rdoc/blob/master/History.rdoc#423--2016-- https://github.com/rdoc/rdoc/blob/master/History.rdoc#422--2016-02-09 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2016-09-05 13:35:30 +03:00			`def self.remove_frozen_string_literal string`
			`string =~ /\A(?:#!.\n)?(.\n)/`
			`first_line = $1`

			`if first_line =~ /\A# +frozen[-_]string[-_]literal[=:].+$/i`
Merge rdoc-6.0.0.beta4 from upstream. It version applied `frozen_string_literal: true` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2017-11-27 13:45:24 +03:00			`string = string.sub first_line, ''`
* lib/rdoc/, test/rdoc/: Update rdoc/rdoc master(f191513) https://github.com/rdoc/rdoc/blob/master/History.rdoc#423--2016-- https://github.com/rdoc/rdoc/blob/master/History.rdoc#422--2016-02-09 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2016-09-05 13:35:30 +03:00			`end`
Merge rdoc-6.0.0.beta4 from upstream. It version applied `frozen_string_literal: true` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2017-11-27 13:45:24 +03:00
			`string`
* lib/rdoc/, test/rdoc/: Update rdoc/rdoc master(f191513) https://github.com/rdoc/rdoc/blob/master/History.rdoc#423--2016-- https://github.com/rdoc/rdoc/blob/master/History.rdoc#422--2016-02-09 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2016-09-05 13:35:30 +03:00			`end`

Import RDoc 3 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30249 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2010-12-20 06:22:49 +03:00			`##`
Merge RDoc 6.0.3 from upstream. It fixed the several bugs that was found after RDoc 6 releasing. From: SHIBATA Hiroshi <hsbt@ruby-lang.org> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2018-03-26 08:56:26 +03:00			`# Detects the encoding of +string+ based on the magic comment`
Import RDoc 3 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30249 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2010-12-20 06:22:49 +03:00
Merge RDoc 6.0.3 from upstream. It fixed the several bugs that was found after RDoc 6 releasing. From: SHIBATA Hiroshi <hsbt@ruby-lang.org> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2018-03-26 08:56:26 +03:00			`def self.detect_encoding string`
			`result = HEADER_REGEXP.match string`
			`name = result && result[:name]`
Import RDoc 3 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30249 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2010-12-20 06:22:49 +03:00
Merge RDoc 6.0.3 from upstream. It fixed the several bugs that was found after RDoc 6 releasing. From: SHIBATA Hiroshi <hsbt@ruby-lang.org> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2018-03-26 08:56:26 +03:00			`name ? Encoding.find(name) : nil`
			`end`
* lib/rdoc/, test/rdoc/: Update rdoc/rdoc master(f191513) https://github.com/rdoc/rdoc/blob/master/History.rdoc#423--2016-- https://github.com/rdoc/rdoc/blob/master/History.rdoc#422--2016-02-09 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2016-09-05 13:35:30 +03:00
Merge RDoc 6.0.3 from upstream. It fixed the several bugs that was found after RDoc 6 releasing. From: SHIBATA Hiroshi <hsbt@ruby-lang.org> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2018-03-26 08:56:26 +03:00			`##`
			`# Removes magic comments and shebang`
Merge rdoc-6.0.0.beta4 from upstream. It version applied `frozen_string_literal: true` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2017-11-27 13:45:24 +03:00
Merge RDoc 6.0.3 from upstream. It fixed the several bugs that was found after RDoc 6 releasing. From: SHIBATA Hiroshi <hsbt@ruby-lang.org> git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2018-03-26 08:56:26 +03:00			`def self.remove_magic_comment string`
			`string.sub HEADER_REGEXP do \|s\|`
			`s.gsub(/[^\n]/, '')`
			`end`
Merge rdoc-6.0.0.beta4 from upstream. It version applied `frozen_string_literal: true` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2017-11-27 13:45:24 +03:00			`end`

			`##`
			`# Changes encoding based on +encoding+ without converting and returns new`
			`# string`

			`def self.change_encoding text, encoding`
			`if text.kind_of? RDoc::Comment`
			`text.encode! encoding`
			`else`
			`# TODO: Remove this condition after Ruby 2.2 EOL`
			`if RUBY_VERSION < '2.3.0'`
			`text.force_encoding encoding`
			`else`
			`String.new text, encoding: encoding`
			`end`
			`end`
Import RDoc 3 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30249 b2dd03c8-39d4-4d8f-98ff-823fe69b080e 2010-12-20 06:22:49 +03:00			`end`

			`end`