[ruby/csv] Enhanced Rdoc for CSV (#122)

https://github.com/ruby/csv/commit/cd670595d5
This commit is contained in:
Burdette Lamar 2020-05-12 16:42:45 -05:00 коммит произвёл Nobuyoshi Nakada
Родитель 033514c62f
Коммит 6ba1abd40c
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 7CD2805BFA3770C6
22 изменённых файлов: 958 добавлений и 212 удалений

45
doc/csv/col_sep.rdoc Normal file
Просмотреть файл

@ -0,0 +1,45 @@
====== Option +col_sep+
Specifies the \String field separator to be used
for both parsing and generating.
The \String will be transcoded into the data's \Encoding before use.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:col_sep) # => "," (comma)
For examples in this section:
ary = ['a', 'b', 'c']
Using the default:
str = CSV.generate_line(line)
str # => "a,b,c\n"
ary = CSV.parse_line(str)
ary # => ["a", "b", "c"]
Using +:+ (colon):
col_sep = ':'
str = CSV.generate_line(ary, col_sep: col_sep)
str # => "a:b:c\n"
ary = CSV.parse_line(str, col_sep: col_sep)
ary # => [["a", "b", "c"]]
Using +::+ (two colons):
col_sep = '::'
str = CSV.generate_line(ary, col_sep: col_sep)
str # => "a::b::c\n"
ary = CSV.parse_line(str, col_sep: col_sep)
ary # => [["a", "b", "c"]]
---
Raises an exception if given the empty \String:
col_sep = ''
# Raises ArgumentError (:col_sep must be 1 or more characters: "")
CSV.parse_line("a:b:c\n", col_sep: col_sep)
Raises an exception if the given value is not String-convertible:
col_sep = BasicObject.new
# Raises NoMethodError (undefined method `to_s' for #<BasicObject:>)
CSV.generate_line(line, col_sep: col_sep)
# Raises NoMethodError (undefined method `to_s' for #<BasicObject:>)
CSV.parse(str, col_sep: col_sep)

45
doc/csv/converters.rdoc Normal file
Просмотреть файл

@ -0,0 +1,45 @@
====== Option +converters+
Specifies a single field converter name or \Proc,
or an \Array of field converter names and Procs.
See {Field Converters}[#class-CSV-label-Field+Converters]
Default value:
CSV::DEFAULT_OPTIONS.fetch(:converters) # => nil
The value may be a single field converter name:
str = '1,2,3'
# Without a converter
ary = CSV.parse_line(str)
ary # => ["1", "2", "3"]
# With built-in converter :integer
ary = CSV.parse_line(str, converters: :integer)
ary # => [1, 2, 3]
The value may be an \Array of field converter names:
str = '1,3.14159'
# Without converters
ary = CSV.parse_line(str)
ary # => ["1", "3.14159"]
# With built-in converters
ary = CSV.parse_line(str, converters: [:integer, :float])
ary # => [1, 3.14159]
The value may be a \Proc custom converter:
str = ' foo , bar , baz '
# Without a converter
ary = CSV.parse_line(str)
ary # => [" foo ", " bar ", " baz "]
# With a custom converter
ary = CSV.parse_line(str, converters: proc {|field| field.strip })
ary # => ["foo", "bar", "baz"]
See also {Custom Converters}[#class-CSV-label-Custom+Converters]
---
Raises an exception if the converter is not a converter name or a \Proc:
str = 'foo,0'
# Raises NoMethodError (undefined method `arity' for nil:NilClass)
CSV.parse(str, converters: :foo)

13
doc/csv/empty_value.rdoc Normal file
Просмотреть файл

@ -0,0 +1,13 @@
====== Option +empty_value+
Specifies the object that is to be substituted
for each field that has an empty \String.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:empty_value) # => "" (empty string)
With the default, <tt>""</tt>:
CSV.parse_line('a,"",b,"",c') # => ["a", "", "b", "", "c"]
With a different object:
CSV.parse_line('a,"",b,"",c', empty_value: 'x') # => ["a", "x", "b", "x", "c"]

Просмотреть файл

@ -0,0 +1,39 @@
====== Option +field_size_limit+
Specifies the \Integer field size limit.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:field_size_limit) # => nil
This is a maximum size CSV will read ahead looking for the closing quote for a field.
(In truth, it reads to the first line ending beyond this size.)
If a quote cannot be found within the limit CSV will raise a MalformedCSVError,
assuming the data is faulty.
You can use this limit to prevent what are effectively DoS attacks on the parser.
However, this limit can cause a legitimate parse to fail;
therefore the default value is +nil+ (no limit).
For the examples in this section:
str = <<~EOT
"a","b"
"
2345
",""
EOT
str # => "\"a\",\"b\"\n\"\n2345\n\",\"\"\n"
Using the default +nil+:
ary = CSV.parse(str)
ary # => [["a", "b"], ["\n2345\n", ""]]
Using <tt>50</tt>:
field_size_limit = 50
ary = CSV.parse(str, field_size_limit: field_size_limit)
ary # => [["a", "b"], ["\n2345\n", ""]]
---
Raises an exception if a field is too long:
big_str = "123456789\n" * 1024
# Raises CSV::MalformedCSVError (Field size exceeded in line 1.)
CSV.parse('valid,fields,"' + big_str + '"', field_size_limit: 2048)

17
doc/csv/force_quotes.rdoc Normal file
Просмотреть файл

@ -0,0 +1,17 @@
====== Option +force_quotes+
Specifies the boolean that determines whether each output field is to be double-quoted.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:force_quotes) # => false
For examples in this section:
ary = ['foo', 0, nil]
Using the default, +false+:
str = CSV.generate_line(ary)
str # => "foo,0,\n"
Using +true+:
str = CSV.generate_line(ary, force_quotes: true)
str # => "\"foo\",\"0\",\"\"\n"

Просмотреть файл

@ -0,0 +1,31 @@
====== Option +header_converters+
Specifies a \String converter name or an \Array of converter names.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:header_converters) # => nil
Identical in functionality to option {converters}[#class-CSV-label-Option+converters]
except that:
- The converters apply only to the header row.
- The built-in header converters are +:downcase+ and +:symbol+.
Examples:
str = <<-EOT
foo,0
bar,1
baz,2
EOT
headers = ['Name', 'Value']
# With no header converter
csv = CSV.parse(str, headers: headers)
csv.headers # => ["Name", "Value"]
# With header converter :downcase
csv = CSV.parse(str, headers: headers, header_converters: :downcase)
csv.headers # => ["name", "value"]
# With header converter :symbol
csv = CSV.parse(str, headers: headers, header_converters: :symbol)
csv.headers # => [:name, :value]
# With both
csv = CSV.parse(str, headers: headers, header_converters: [:downcase, :symbol])
csv.headers # => [:name, :value]

63
doc/csv/headers.rdoc Normal file
Просмотреть файл

@ -0,0 +1,63 @@
====== Option +headers+
Specifies a boolean, \Symbol, \Array, or \String to be used
to define column headers.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:headers) # => false
---
Without +headers+:
str = <<-EOT
Name,Count
foo,0
bar,1
bax,2
EOT
csv = CSV.new(str)
csv # => #<CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
csv.headers # => nil
csv.shift # => ["Name", "Count"]
---
If set to +true+ or the \Symbol +:first_row+,
the first row of the data is treated as a row of headers:
str = <<-EOT
Name,Count
foo,0
bar,1
bax,2
EOT
csv = CSV.new(str, headers: true)
csv # => #<CSV io_type:StringIO encoding:UTF-8 lineno:2 col_sep:"," row_sep:"\n" quote_char:"\"" headers:["Name", "Count"]>
csv.headers # => ["Name", "Count"]
csv.shift # => #<CSV::Row "Name":"bar" "Count":"1">
---
If set to an \Array, the \Array elements are treated as headers:
str = <<-EOT
foo,0
bar,1
bax,2
EOT
csv = CSV.new(str, headers: ['Name', 'Count'])
csv
csv.headers # => ["Name", "Count"]
csv.shift # => #<CSV::Row "Name":"bar" "Count":"1">
---
If set to a \String +str+, method <tt>CSV::parse_line(str, options)</tt> is called
with the current +options+, and the returned \Array is treated as headers:
str = <<-EOT
foo,0
bar,1
bax,2
EOT
csv = CSV.new(str, headers: 'Name,Count')
csv
csv.headers # => ["Name", "Count"]
csv.shift # => #<CSV::Row "Name":"bar" "Count":"1">

Просмотреть файл

@ -0,0 +1,19 @@
====== Option +liberal_parsing+
Specifies the boolean value that determines whether
CSV will attempt to parse input not conformant with RFC 4180,
such as double quotes in unquoted fields.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:liberal_parsing) # => false
For examples in this section:
str = 'is,this "three, or four",fields'
Without +liberal_parsing+:
# Raises CSV::MalformedCSVError (Illegal quoting in str 1.)
CSV.parse_line(str)
With +liberal_parsing+:
ary = CSV.parse_line(str, liberal_parsing: true)
ary # => ["is", "this \"three", " or four\"", "fields"]

12
doc/csv/nil_value.rdoc Normal file
Просмотреть файл

@ -0,0 +1,12 @@
====== Option +nil_value+
Specifies the object that is to be substituted for each null (no-text) field.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:nil_value) # => nil
With the default, +nil+:
CSV.parse_line('a,,b,,c') # => ["a", nil, "b", nil, "c"]
With a different object:
CSV.parse_line('a,,b,,c', nil_value: 0) # => ["a", 0, "b", 0, "c"]

32
doc/csv/quote_char.rdoc Normal file
Просмотреть файл

@ -0,0 +1,32 @@
====== Option +quote_char+
Specifies the character (\String of length 1) used used to quote fields
in both parsing and generating.
This String will be transcoded into the data's \Encoding before use.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:quote_char) # => "\"" (backslash)
This is useful for an application that incorrectly uses <tt>'</tt> (single-quote)
to quote fields, instead of the correct <tt>"</tt> (double-quote).
Using the default:
ary = ['a', 'b', '"c"', 'd']
str = CSV.generate_line(ary)
str # => "a,b,\"\"\"c\"\"\",d\n"
ary = CSV.parse_line(str)
ary # => ["a", "b", "\"c\"", "d"]
Using <tt>'</tt> (single-quote):
quote_char = "'"
ary = ['a', 'b', '\'c\'', 'd']
str = CSV.generate_line(ary, quote_char: quote_char)
str # => "a,b,'''c''',d\n"
ary = CSV.parse_line(str, quote_char: quote_char)
ary # => [["a", "b", "'c'", "d"]]
---
Raises an exception if the \String length is greater than 1:
# Raises ArgumentError (:quote_char has to be nil or a single character String)
CSV.new('', quote_char: 'xx')

12
doc/csv/quote_empty.rdoc Normal file
Просмотреть файл

@ -0,0 +1,12 @@
====== Option +quote_empty+
Specifies the boolean that determines whether an empty value is to be double-quoted.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:quote_empty) # => true
With the default +true+:
CSV.generate_line(['"', ""]) # => "\"\"\"\",\"\"\n"
With +false+:
CSV.generate_line(['"', ""], quote_empty: false) # => "\"\"\"\",\n"

Просмотреть файл

@ -0,0 +1,22 @@
====== Option +return_headers+
Specifies the boolean that determines whether method #shift
returns or ignores the header row.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:return_headers) # => false
Examples:
str = <<-EOT
Name,Count
foo,0
bar,1
bax,2
EOT
# Without return_headers first row is str.
csv = CSV.new(str, headers: true)
csv.shift # => #<CSV::Row "Name":"foo" "Count":"0">
# With return_headers first row is headers.
csv = CSV.new(str, headers: true, return_headers: true)
csv.shift # => #<CSV::Row "Name":"Name" "Count":"Count">

91
doc/csv/row_sep.rdoc Normal file
Просмотреть файл

@ -0,0 +1,91 @@
====== Option +row_sep+
Specifies the row separator, a \String or the \Symbol <tt>:auto</tt> (see below),
to be used for both parsing and generating.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:row_sep) # => :auto
---
When +row_sep+ is a \String, that \String becomes the row separator.
The String will be transcoded into the data's Encoding before use.
Using <tt>"\n"</tt>:
str = CSV.generate do |csv|
csv << [:foo, 0]
csv << [:bar, 1]
csv << [:baz, 2]
end
str # => "foo,0\nbar,1\nbaz,2\n"
ary = CSV.parse(str)
ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
Using <tt>|</tt> (pipe):
row_sep = '|'
str = CSV.generate(row_sep: row_sep) do |csv|
csv << [:foo, 0]
csv << [:bar, 1]
csv << [:baz, 2]
end
str # => "foo,0|bar,1|baz,2|"
ary = CSV.parse(str, row_sep: row_sep)
ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
Using <tt>--</tt> (two hyphens):
row_sep = '--'
str = CSV.generate(row_sep: row_sep) do |csv|
csv << [:foo, 0]
csv << [:bar, 1]
csv << [:baz, 2]
end
str # => "foo,0--bar,1--baz,2--"
ary = CSV.parse(str, row_sep: row_sep)
ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
Using <tt>''</tt> (empty string):
row_sep = ''
str = CSV.generate(row_sep: row_sep) do |csv|
csv << [:foo, 0]
csv << [:bar, 1]
csv << [:baz, 2]
end
str # => "foo,0bar,1baz,2"
ary = CSV.parse(str, row_sep: row_sep)
ary # => [["foo", "0bar", "1baz", "2"]]
---
When +row_sep+ is the \Symbol +:auto+ (the default),
invokes auto-discovery of the row separator.
Auto-discovery reads ahead in the data looking for the next <tt>\r\n</tt>, +\n+, or +\r+ sequence.
The sequence will be selected even if it occurs in a quoted field,
assuming that you would have the same line endings there.
row_sep = :auto
str = CSV.generate(row_sep: row_sep) do |csv|
csv << [:foo, 0]
csv << [:bar, 1]
csv << [:baz, 2]
end
str # => "foo,0\nbar,1\nbaz,2\n"
ary = CSV.parse(str, row_sep: row_sep)
ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
The default <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>) is used
if any of the following is true:
* None of those sequences is found.
* Data is +ARGF+, +STDIN+, +STDOUT+, or +STDERR+.
* The stream is only available for output.
Obviously, discovery takes a little time. Set manually if speed is important. Also note that IO objects should be opened in binary mode on Windows if this feature will be used as the line-ending translation can cause problems with resetting the document position to where it was before the read ahead.
---
Raises an exception if the given value is not String-convertible:
row_sep = BasicObject.new
# Raises NoMethodError (undefined method `to_s' for #<BasicObject:>)
CSV.generate_line(ary, row_sep: row_sep)
# Raises NoMethodError (undefined method `to_s' for #<BasicObject:>)
CSV.parse(str, row_sep: row_sep)

31
doc/csv/skip_blanks.rdoc Normal file
Просмотреть файл

@ -0,0 +1,31 @@
====== Option +skip_blanks+
Specifies a boolean that determines whether blank lines in the input will be ignored;
a line that contains a column separator is not considered to be blank.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:skip_blanks) # => false
See also option {skiplines}[#class-CSV-label-Option+skip_lines].
For examples in this section:
str = <<-EOT
foo,0
bar,1
baz,2
,
EOT
Using the default, +false+:
ary = CSV.parse(str)
ary # => [["foo", "0"], [], ["bar", "1"], ["baz", "2"], [], [nil, nil]]
Using +true+:
ary = CSV.parse(str, skip_blanks: true)
ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"], [nil, nil]]
Using a truthy value:
ary = CSV.parse(str, skip_blanks: :foo)
ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"], [nil, nil]]

37
doc/csv/skip_lines.rdoc Normal file
Просмотреть файл

@ -0,0 +1,37 @@
====== Option +skip_lines+
Specifies an object to use in identifying comment lines in the input that are to be ignored:
* If a \Regexp, ignores lines that match it.
* If a \String, converts it to a \Regexp, ignores lines that match it.
* If +nil+, no lines are considered to be comments.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:skip_lines) # => nil
For examples in this section:
str = <<-EOT
# Comment
foo,0
bar,1
baz,2
# Another comment
EOT
str # => "# Comment\nfoo,0\nbar,1\nbaz,2\n# Another comment\n"
Using the default, +nil+:
ary = CSV.parse(str)
ary # => [["# Comment"], ["foo", "0"], ["bar", "1"], ["baz", "2"], ["# Another comment"]]
Using a \Regexp:
ary = CSV.parse(str, skip_lines: /^#/)
ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
Using a \String:
ary = CSV.parse(str, skip_lines: '#')
ary # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]
---
Raises an exception if given an object that is not a \Regexp, a \String, or +nil+:
# Raises ArgumentError (:skip_lines has to respond to #match: 0)
CSV.parse(str, skip_lines: 0)

15
doc/csv/strip.rdoc Normal file
Просмотреть файл

@ -0,0 +1,15 @@
====== Option +strip+
Specifies the boolean value that determines whether
whitespace is stripped from each input field.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:strip) # => false
With default value +false+:
ary = CSV.parse_line(' a , b ')
ary # => [" a ", " b "]
With value +true+:
ary = CSV.parse_line(' a , b ', strip: true)
ary # => ["a", "b"]

Просмотреть файл

@ -0,0 +1,27 @@
====== Option +unconverted_fields+
Specifies the boolean that determines whether unconverted field values are to be available.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:unconverted_fields) # => nil
The unconverted field values are those found in the source data,
prior to any conversions performed via option +converters+.
When option +unconverted_fields+ is +true+,
each returned row (\Array or \CSV::Row) has an added method,
+unconverted_fields+, that returns the unconverted field values:
str = <<-EOT
foo,0
bar,1
baz,2
EOT
# Without unconverted_fields
csv = CSV.parse(str, converters: :integer)
csv # => [["foo", 0], ["bar", 1], ["baz", 2]]
csv.first.respond_to?(:unconverted_fields) # => false
# With unconverted_fields
csv = CSV.parse(str, converters: :integer, unconverted_fields: true)
csv # => [["foo", 0], ["bar", 1], ["baz", 2]]
csv.first.respond_to?(:unconverted_fields) # => true
csv.first.unconverted_fields # => ["foo", "0"]

Просмотреть файл

@ -0,0 +1,31 @@
====== Option +write_converters+
Specifies the \Proc or \Array of Procs that are to be called
for converting each output field.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:write_converters) # => nil
With no write converter:
str = CSV.generate_line(["\na\n", "\tb\t", " c "])
str # => "\"\na\n\",\tb\t, c \n"
With a write converter:
strip_converter = lambda {|field| field.strip }
str = CSV.generate_line(["\na\n", "\tb\t", " c "], write_converters: strip_converter)
str # => "a,b,c\n"
With two write converters (called in order):
upcase_converter = lambda {|field| field.upcase }
downcase_converter = lambda {|field| field.downcase }
write_converters = [upcase_converter, downcase_converter]
str = CSV.generate_line(['a', 'b', 'c'], write_converters: write_converters)
str # => "a,b,c\n"
---
Raises an exception if the converter returns a value that is neither +nil+
nor \String-convertible:
bad_converter = lambda {|field| BasicObject.new }
# Raises NoMethodError (undefined method `is_a?' for #<BasicObject:>)
CSV.generate_line(['a', 'b', 'c'], write_converters: bad_converter)

Просмотреть файл

@ -0,0 +1,15 @@
====== Option +write_empty_value+
Specifies the object that is to be substituted for each field
that has an empty \String.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:write_empty_value) # => ""
Without the option:
str = CSV.generate_line(['a', '', 'c', ''])
str # => "a,\"\",c,\"\"\n"
With the option:
str = CSV.generate_line(['a', '', 'c', ''], write_empty_value: "x")
str # => "a,x,c,x\n"

Просмотреть файл

@ -0,0 +1,29 @@
====== Option +write_headers+
Specifies the boolean that determines whether a header row is included in the output;
ignored if there are no headers.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:write_headers) # => nil
Without +write_headers+:
file_path = 't.csv'
CSV.open(file_path,'w',
:headers => ['Name','Value']
) do |csv|
csv << ['foo', '0']
end
CSV.open(file_path) do |csv|
csv.shift
end # => ["foo", "0"]
With +write_headers+":
CSV.open(file_path,'w',
:write_headers=> true,
:headers => ['Name','Value']
) do |csv|
csv << ['foo', '0']
end
CSV.open(file_path) do |csv|
csv.shift
end # => ["Name", "Value"]

Просмотреть файл

@ -0,0 +1,14 @@
====== Option +write_nil_value+
Specifies the object that is to be substituted for each +nil+ field.
Default value:
CSV::DEFAULT_OPTIONS.fetch(:write_nil_value) # => nil
Without the option:
str = CSV.generate_line(['a', nil, 'c', nil])
str # => "a,,c,\n"
With the option:
str = CSV.generate_line(['a', nil, 'c', nil], write_nil_value: "x")
str # => "a,x,c,x\n"

Просмотреть файл

@ -103,7 +103,6 @@ require_relative "csv/writer"
using CSV::MatchP if CSV.const_defined?(:MatchP)
#
# This class provides a complete interface to CSV files and data. It offers
# tools to enable you to read and write to and from Strings or IO objects, as
# needed.
@ -179,9 +178,89 @@ using CSV::MatchP if CSV.const_defined?(:MatchP)
# CSV($stderr) { |csv_err| csv_err << %w{my data here} } # to $stderr
# CSV($stdin) { |csv_in| csv_in.each { |row| p row } } # from $stdin
#
# == Data Conversion
# == Options
#
# === CSV with headers
# The default values for options are:
# DEFAULT_OPTIONS = {
# # For both parsing and generating.
# col_sep: ",",
# row_sep: :auto,
# quote_char: '"',
# # For parsing.
# field_size_limit: nil,
# converters: nil,
# unconverted_fields: nil,
# headers: false,
# return_headers: false,
# header_converters: nil,
# skip_blanks: false,
# skip_lines: nil,
# liberal_parsing: false,
# nil_value: nil,
# empty_value: "",
# # For generating.
# write_headers: nil,
# quote_empty: true,
# force_quotes: false,
# write_converters: nil,
# write_nil_value: nil,
# write_empty_value: "",
# strip: false,
# }
#
# === Options for Parsing
#
# :include: ../doc/col_sep.rdoc
#
# :include: ../doc/row_sep.rdoc
#
# :include: ../doc/quote_char.rdoc
#
# :include: ../doc/field_size_limit.rdoc
#
# :include: ../doc/converters.rdoc
#
# :include: ../doc/unconverted_fields.rdoc
#
# :include: ../doc/headers.rdoc
#
# :include: ../doc/return_headers.rdoc
#
# :include: ../doc/header_converters.rdoc
#
# :include: ../doc/skip_blanks.rdoc
#
# :include: ../doc/skip_lines.rdoc
#
# :include: ../doc/liberal_parsing.rdoc
#
# :include: ../doc/nil_value.rdoc
#
# :include: ../doc/empty_value.rdoc
#
# === Options for Generating
#
# :include: ../doc/col_sep.rdoc
#
# :include: ../doc/row_sep.rdoc
#
# :include: ../doc/quote_char.rdoc
#
# :include: ../doc/write_headers.rdoc
#
# :include: ../doc/force_quotes.rdoc
#
# :include: ../doc/quote_empty.rdoc
#
# :include: ../doc/write_converters.rdoc
#
# :include: ../doc/write_nil_value.rdoc
#
# :include: ../doc/write_empty_value.rdoc
#
# :include: ../doc/strip.rdoc
#
# == CSV with headers
#
# CSV allows to specify column names of CSV file, whether they are in data, or
# provided separately. If headers are specified, reading methods return an instance
@ -203,22 +282,205 @@ using CSV::MatchP if CSV.const_defined?(:MatchP)
# data = CSV.parse('Bob,Engineering,1000', headers: %i[name department salary])
# data.first #=> #<CSV::Row name:"Bob" department:"Engineering" salary:"1000">
#
# === Typed data reading
# == \CSV \Converters
#
# CSV allows to provide a set of data _converters_ e.g. transformations to try on input
# data. Converter could be a symbol from CSV::Converters constant's keys, or lambda.
# By default, each field parsed by \CSV is formed into a \String.
# You can use a _converter_ to convert certain fields into other Ruby objects.
#
# # Without any converters:
# CSV.parse('Bob,2018-03-01,100')
# #=> [["Bob", "2018-03-01", "100"]]
# When you specify a converter for parsing,
# each parsed field is passed to the converter;
# its return value becomes the new value for the field.
# A converter might, for example, convert an integer embedded in a \String
# into a true \Integer.
# (In fact, that's what built-in field converter +:integer+ does.)
#
# # With built-in converters:
# CSV.parse('Bob,2018-03-01,100', converters: %i[numeric date])
# #=> [["Bob", #<Date: 2018-03-01>, 100]]
# There are additional built-in \converters, and custom \converters are also supported.
#
# # With custom converters:
# CSV.parse('Bob,2018-03-01,100', converters: [->(v) { Time.parse(v) rescue v }])
# #=> [["Bob", 2018-03-01 00:00:00 +0200, "100"]]
# All \converters try to transcode fields to UTF-8 before converting.
# The conversion will fail if the data cannot be transcoded, leaving the field unchanged.
#
# === Field \Converters
#
# There are three ways to use field \converters;
# these examples use built-in field converter +:integer+,
# which converts each parsed integer string to a true \Integer.
#
# Option +converters+ with a singleton parsing method:
# ary = CSV.parse_line('0,1,2', converters: :integer)
# ary # => [0, 1, 2]
#
# Option +converters+ with a new \CSV instance:
# csv = CSV.new('0,1,2', converters: :integer)
# # Field converters in effect:
# csv.converters # => [:integer]
# csv.shift # => [0, 1, 2]
#
# Method #convert adds a field converter to a \CSV instance:
# csv = CSV.new('0,1,2')
# # Add a converter.
# csv.convert(:integer)
# csv.converters # => [:integer]
# csv.shift # => [0, 1, 2]
#
# ---
#
# The built-in field \converters are in \Hash CSV::Converters.
# The \Symbol keys there are the names of the \converters:
#
# CSV::Converters.keys # => [:integer, :float, :numeric, :date, :date_time, :all]
#
# Converter +:integer+ converts each field that +Integer()+ accepts:
# data = '0,1,2,x'
# # Without the converter
# csv = CSV.parse_line(data)
# csv # => ["0", "1", "2", "x"]
# # With the converter
# csv = CSV.parse_line(data, converters: :integer)
# csv # => [0, 1, 2, "x"]
#
# Converter +:float+ converts each field that +Float()+ accepts:
# data = '1.0,3.14159,x'
# # Without the converter
# csv = CSV.parse_line(data)
# csv # => ["1.0", "3.14159", "x"]
# # With the converter
# csv = CSV.parse_line(data, converters: :float)
# csv # => [1.0, 3.14159, "x"]
#
# Converter +:numeric+ converts with both +:integer+ and +:float+..
#
# Converter +:date+ converts each field that +Date::parse()+ accepts:
# data = '2001-02-03,x'
# # Without the converter
# csv = CSV.parse_line(data)
# csv # => ["2001-02-03", "x"]
# # With the converter
# csv = CSV.parse_line(data, converters: :date)
# csv # => [#<Date: 2001-02-03 ((2451944j,0s,0n),+0s,2299161j)>, "x"]
#
# Converter +:date_time+ converts each field that +DateTime::parse() accepts:
# data = '2020-05-07T14:59:00-05:00,x'
# # Without the converter
# csv = CSV.parse_line(data)
# csv # => ["2020-05-07T14:59:00-05:00", "x"]
# # With the converter
# csv = CSV.parse_line(data, converters: :date_time)
# csv # => [#<DateTime: 2020-05-07T14:59:00-05:00 ((2458977j,71940s,0n),-18000s,2299161j)>, "x"]
#
# Converter +:numeric+ converts with both +:date_time+ and +:numeric+..
#
# As seen above, method #convert adds \converters to a \CSV instance,
# and method #converters returns an \Array of the \converters in effect:
# csv = CSV.new('0,1,2')
# csv.converters # => []
# csv.convert(:integer)
# csv.converters # => [:integer]
# csv.convert(:date)
# csv.converters # => [:integer, :date]
#
# You can add a custom field converter to \Hash CSV::Converters:
# strip_converter = proc {|field| field.strip}
# CSV::Converters[:strip] = strip_converter
# CSV::Converters.keys # => [:integer, :float, :numeric, :date, :date_time, :all, :strip]
#
# Then use it to convert fields:
# str = ' foo , 0 '
# ary = CSV.parse_line(str, converters: :strip)
# ary # => ["foo", "0"]
#
# See {Custom Converters}[#class-CSV-label-Custom+Converters].
#
# === Header \Converters
#
# Header converters operate only on headers (and not on other rows).
#
# There are three ways to use header \converters;
# these examples use built-in header converter +:dowhcase+,
# which downcases each parsed header.
#
# Option +header_converters+ with a singleton parsing method:
# str = "Name,Count\nFoo,0\n,Bar,1\nBaz,2"
# tbl = CSV.parse(str, headers: true, header_converters: :downcase)
# tbl.class # => CSV::Table
# tbl.headers # => ["name", "count"]
#
# Option +header_converters+ with a new \CSV instance:
# csv = CSV.new(str, header_converters: :downcase)
# # Header converters in effect:
# csv.header_converters # => [:downcase]
# tbl = CSV.parse(str, headers: true)
# tbl.headers # => ["Name", "Count"]
#
# Method #header_convert adds a header converter to a \CSV instance:
# csv = CSV.new(str)
# # Add a header converter.
# csv.header_convert(:downcase)
# csv.header_converters # => [:downcase]
# tbl = CSV.parse(str, headers: true)
# tbl.headers # => ["Name", "Count"]
#
# ---
#
# The built-in header \converters are in \Hash CSV::Converters.
# The \Symbol keys there are the names of the \converters:
#
# CSV::HeaderConverters.keys # => [:downcase, :symbol]
#
# Converter +:downcase+ converts each header by downcasing it:
# str = "Name,Count\nFoo,0\n,Bar,1\nBaz,2"
# tbl = CSV.parse(str, headers: true, header_converters: :downcase)
# tbl.class # => CSV::Table
# tbl.headers # => ["name", "count"]
#
# Converter +:symbol+ by making it into a \Symbol:
# str = "Name,Count\nFoo,0\n,Bar,1\nBaz,2"
# tbl = CSV.parse(str, headers: true, header_converters: :symbol)
# tbl.headers # => [:name, :count]
# Details:
# - Strips leading and trailing whitespace.
# - Downcases the header.
# - Replaces embedded spaces with underscores.
# - Removes non-word characters.
# - Makes the string into a \Symbol.
#
# You can add a custom header converter to \Hash CSV::HeaderConverters:
# strip_converter = proc {|field| field.strip}
# CSV::HeaderConverters[:strip] = strip_converter
# CSV::HeaderConverters.keys # => [:downcase, :symbol, :strip]
#
# Then use it to convert headers:
# str = " Name , Value \nfoo,0\nbar,1\nbaz,2"
# tbl = CSV.parse(str, headers: true, header_converters: :strip)
# tbl.headers # => ["Name", "Value"]
#
# See {Custom Converters}[#class-CSV-label-Custom+Converters].
#
# === Custom \Converters
#
# You can define custom \converters.
#
# The \converter is a \Proc that is called with two arguments,
# \String +field+ and CSV::FieldInfo +field_info+;
# it returns a \String that will become the field value:
# converter = proc {|field, field_info| <some_string> }
#
# To illustrate:
# converter = proc {|field, field_info| p [field, field_info]; field}
# ary = CSV.parse_line('foo,0', converters: converter)
#
# Produces:
# ["foo", #<struct CSV::FieldInfo index=0, line=1, header=nil>]
# ["0", #<struct CSV::FieldInfo index=1, line=1, header=nil>]
#
# In each of the output lines:
# - The first \Array element is the passed \String field.
# - The second is a \FieldInfo structure containing information about the field:
# - The 0-based column index.
# - The 1-based line number.
# - The header for the column, if available.
#
# If the \converter does not need +field_info+, it can be omitted:
# converter = proc {|field| ... }
#
# == CSV and Character Encodings (M17n or Multilingualization)
#
@ -380,29 +642,13 @@ class CSV
gsub(/\s+/, "_").to_sym
}
}
#
# The options used when no overrides are given by calling code. They are:
#
# <b><tt>:col_sep</tt></b>:: <tt>","</tt>
# <b><tt>:row_sep</tt></b>:: <tt>:auto</tt>
# <b><tt>:quote_char</tt></b>:: <tt>'"'</tt>
# <b><tt>:field_size_limit</tt></b>:: +nil+
# <b><tt>:converters</tt></b>:: +nil+
# <b><tt>:unconverted_fields</tt></b>:: +nil+
# <b><tt>:headers</tt></b>:: +false+
# <b><tt>:return_headers</tt></b>:: +false+
# <b><tt>:header_converters</tt></b>:: +nil+
# <b><tt>:skip_blanks</tt></b>:: +false+
# <b><tt>:force_quotes</tt></b>:: +false+
# <b><tt>:skip_lines</tt></b>:: +nil+
# <b><tt>:liberal_parsing</tt></b>:: +false+
# <b><tt>:quote_empty</tt></b>:: +true+
#
# Default values for method options.
DEFAULT_OPTIONS = {
# For both parsing and generating.
col_sep: ",",
row_sep: :auto,
quote_char: '"',
# For parsing.
field_size_limit: nil,
converters: nil,
unconverted_fields: nil,
@ -410,10 +656,18 @@ class CSV
return_headers: false,
header_converters: nil,
skip_blanks: false,
force_quotes: false,
skip_lines: nil,
liberal_parsing: false,
nil_value: nil,
empty_value: "",
# For generating.
write_headers: nil,
quote_empty: true,
force_quotes: false,
write_converters: nil,
write_nil_value: nil,
write_empty_value: "",
strip: false,
}.freeze
class << self
@ -423,6 +677,9 @@ class CSV
# the same +data+ object (tested by Object#object_id()) with the same
# +options+.
#
# See {Options for Parsing}[#class-CSV-label-Options+for+Parsing]
# and {Options for Generating}[#class-CSV-label-Options+for+Generating].
#
# If a block is given, the instance is passed to the block and the return
# value becomes the return value of the block.
#
@ -463,6 +720,9 @@ class CSV
# <tt>:out_</tt> or <tt>:output_</tt> affect only +output+. All other keys
# are assigned to both objects.
#
# See {Options for Parsing}[#class-CSV-label-Options+for+Parsing]
# and {Options for Generating}[#class-CSV-label-Options+for+Generating].
#
# The <tt>:output_row_sep</tt> +option+ defaults to
# <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>).
#
@ -496,6 +756,8 @@ class CSV
# pass a +path+ and any +options+ you wish to set for the read. Each row of
# file will be passed to the provided +block+ in turn.
#
# See {Options for Parsing}[#class-CSV-label-Options+for+Parsing].
#
# The +options+ parameter can be anything CSV::new() understands. This method
# also understands an additional <tt>:encoding</tt> parameter that you can use
# to specify the Encoding of the data in the file to be read. You must provide
@ -525,10 +787,11 @@ class CSV
# Note that a passed String *is* modified by this method. Call dup() before
# passing if you need a new String.
#
# The +options+ parameter can be anything CSV::new() understands. This method
# understands an additional <tt>:encoding</tt> parameter when not passed a
# String to set the base Encoding for the output. CSV needs this hint if you
# plan to output non-ASCII compatible data.
# See {Options for Generating}[#class-CSV-label-Options+for+Generating].
#
# This method has one additional option: <tt>:encoding</tt>,
# which sets the base Encoding for the output if no no +str+ is specified.
# CSV needs this hint if you plan to output non-ASCII compatible data.
#
def generate(str=nil, **options)
encoding = options[:encoding]
@ -550,8 +813,9 @@ class CSV
# This method is a shortcut for converting a single row (Array) into a CSV
# String.
#
# The +options+ parameter can be anything CSV::new() understands. This method
# understands an additional <tt>:encoding</tt> parameter to set the base
# See {Options for Generating}[#class-CSV-label-Options+for+Generating].
#
# This method accepts an additional option, <tt>:encoding</tt>, which sets the base
# Encoding for the output. This method will try to guess your Encoding from
# the first non-+nil+ field in +row+, if possible, but you may need to use
# this parameter as a backup plan.
@ -581,8 +845,9 @@ class CSV
# as the primary interface for writing a CSV file.
#
# You must pass a +filename+ and may optionally add a +mode+ for Ruby's
# open(). You may also pass an optional Hash containing any +options+
# CSV::new() understands as the final argument.
# open().
#
# See {Options for Generating}[#class-CSV-label-Options+for+Generating].
#
# This method works like Ruby's open() call, in that it will pass a CSV object
# to a provided block and close it when the block terminates, or it will
@ -674,8 +939,8 @@ class CSV
# provide a +block+ which will be called with each row of the String in turn,
# or just use the returned Array of Arrays (when no +block+ is given).
#
# You pass your +str+ to read from, and an optional +options+ containing
# anything CSV::new() understands.
# You pass your +str+ to read from, and an optional +options+.
# See {Options for Parsing}[#class-CSV-label-Options+for+Parsing].
#
def parse(str, **options, &block)
csv = new(str, **options)
@ -695,7 +960,7 @@ class CSV
# an Array. Note that if +line+ contains multiple rows, anything beyond the
# first row is ignored.
#
# The +options+ parameter can be anything CSV::new() understands.
# See {Options for Parsing}[#class-CSV-label-Options+for+Parsing].
#
def parse_line(line, **options)
new(line, **options).shift
@ -703,7 +968,10 @@ class CSV
#
# Use to slurp a CSV file into an Array of Arrays. Pass the +path+ to the
# file and any +options+ CSV::new() understands. This method also understands
# file and +options+.
# See {Options for Parsing}[#class-CSV-label-Options+for+Parsing].
#
# This method also understands
# an additional <tt>:encoding</tt> parameter that you can use to specify the
# Encoding of the data in the file to be read. You must provide this unless
# your data is in Encoding::default_external(). CSV will use this to determine
@ -728,6 +996,7 @@ class CSV
# converters: :numeric,
# header_converters: :symbol }.merge(options) )
#
# See {Options for Parsing}[#class-CSV-label-Options+for+Parsing].
def table(path, **options)
default_options = {
headers: true,
@ -750,171 +1019,8 @@ class CSV
# reading). If you want it at the end (for writing), use CSV::generate().
# If you want any other positioning, pass a preset StringIO object instead.
#
# You may set any reading and/or writing preferences in the +options+ Hash.
# Available options are:
#
# <b><tt>:col_sep</tt></b>:: The String placed between each field.
# This String will be transcoded into
# the data's Encoding before parsing.
# <b><tt>:row_sep</tt></b>:: The String appended to the end of each
# row. This can be set to the special
# <tt>:auto</tt> setting, which requests
# that CSV automatically discover this
# from the data. Auto-discovery reads
# ahead in the data looking for the next
# <tt>"\r\n"</tt>, <tt>"\n"</tt>, or
# <tt>"\r"</tt> sequence. A sequence
# will be selected even if it occurs in
# a quoted field, assuming that you
# would have the same line endings
# there. If none of those sequences is
# found, +data+ is <tt>ARGF</tt>,
# <tt>STDIN</tt>, <tt>STDOUT</tt>, or
# <tt>STDERR</tt>, or the stream is only
# available for output, the default
# <tt>$INPUT_RECORD_SEPARATOR</tt>
# (<tt>$/</tt>) is used. Obviously,
# discovery takes a little time. Set
# manually if speed is important. Also
# note that IO objects should be opened
# in binary mode on Windows if this
# feature will be used as the
# line-ending translation can cause
# problems with resetting the document
# position to where it was before the
# read ahead. This String will be
# transcoded into the data's Encoding
# before parsing.
# <b><tt>:quote_char</tt></b>:: The character used to quote fields.
# This has to be a single character
# String. This is useful for
# application that incorrectly use
# <tt>'</tt> as the quote character
# instead of the correct <tt>"</tt>.
# CSV will always consider a double
# sequence of this character to be an
# escaped quote. This String will be
# transcoded into the data's Encoding
# before parsing.
# <b><tt>:field_size_limit</tt></b>:: This is a maximum size CSV will read
# ahead looking for the closing quote
# for a field. (In truth, it reads to
# the first line ending beyond this
# size.) If a quote cannot be found
# within the limit CSV will raise a
# MalformedCSVError, assuming the data
# is faulty. You can use this limit to
# prevent what are effectively DoS
# attacks on the parser. However, this
# limit can cause a legitimate parse to
# fail and thus is set to +nil+, or off,
# by default.
# <b><tt>:converters</tt></b>:: An Array of names from the Converters
# Hash and/or lambdas that handle custom
# conversion. A single converter
# doesn't have to be in an Array. All
# built-in converters try to transcode
# fields to UTF-8 before converting.
# The conversion will fail if the data
# cannot be transcoded, leaving the
# field unchanged.
# <b><tt>:unconverted_fields</tt></b>:: If set to +true+, an
# unconverted_fields() method will be
# added to all returned rows (Array or
# CSV::Row) that will return the fields
# as they were before conversion. Note
# that <tt>:headers</tt> supplied by
# Array or String were not fields of the
# document and thus will have an empty
# Array attached.
# <b><tt>:headers</tt></b>:: If set to <tt>:first_row</tt> or
# +true+, the initial row of the CSV
# file will be treated as a row of
# headers. If set to an Array, the
# contents will be used as the headers.
# If set to a String, the String is run
# through a call of CSV::parse_line()
# with the same <tt>:col_sep</tt>,
# <tt>:row_sep</tt>, and
# <tt>:quote_char</tt> as this instance
# to produce an Array of headers. This
# setting causes CSV#shift() to return
# rows as CSV::Row objects instead of
# Arrays and CSV#read() to return
# CSV::Table objects instead of an Array
# of Arrays.
# <b><tt>:return_headers</tt></b>:: When +false+, header rows are silently
# swallowed. If set to +true+, header
# rows are returned in a CSV::Row object
# with identical headers and
# fields (save that the fields do not go
# through the converters).
# <b><tt>:write_headers</tt></b>:: When +true+ and <tt>:headers</tt> is
# set, a header row will be added to the
# output.
# <b><tt>:header_converters</tt></b>:: Identical in functionality to
# <tt>:converters</tt> save that the
# conversions are only made to header
# rows. All built-in converters try to
# transcode headers to UTF-8 before
# converting. The conversion will fail
# if the data cannot be transcoded,
# leaving the header unchanged.
# <b><tt>:skip_blanks</tt></b>:: When setting a +true+ value, CSV will
# skip over any empty rows. Note that
# this setting will not skip rows that
# contain column separators, even if
# the rows contain no actual data. If
# you want to skip rows that contain
# separators but no content, consider
# using <tt>:skip_lines</tt>, or
# inspecting fields.compact.empty? on
# each row.
# <b><tt>:force_quotes</tt></b>:: When setting a +true+ value, CSV will
# quote all CSV fields it creates.
# <b><tt>:skip_lines</tt></b>:: When setting an object responding to
# <tt>match</tt>, every line matching
# it is considered a comment and ignored
# during parsing. When set to a String,
# it is first converted to a Regexp.
# When set to +nil+ no line is considered
# a comment. If the passed object does
# not respond to <tt>match</tt>,
# <tt>ArgumentError</tt> is thrown.
# <b><tt>:liberal_parsing</tt></b>:: When setting a +true+ value, CSV will
# attempt to parse input not conformant
# with RFC 4180, such as double quotes
# in unquoted fields.
# <b><tt>:nil_value</tt></b>:: When set an object, any values of an
# empty field is replaced by the set
# object, not nil.
# <b><tt>:empty_value</tt></b>:: When setting an object, any values of a
# blank string field is replaced by
# the set object.
# <b><tt>:quote_empty</tt></b>:: When setting a +true+ value, CSV will
# quote empty values with double quotes.
# When +false+, CSV will emit an
# empty string for an empty field value.
# <b><tt>:write_converters</tt></b>:: Converts values on each line with the
# specified <tt>Proc</tt> object(s),
# which receive a <tt>String</tt> value
# and return a <tt>String</tt> or +nil+
# value.
# When an array is specified, each
# converter will be applied in order.
# <b><tt>:write_nil_value</tt></b>:: When a <tt>String</tt> value, +nil+
# value(s) on each line will be replaced
# with the specified value.
# <b><tt>:write_empty_value</tt></b>:: When a <tt>String</tt> or +nil+ value,
# empty value(s) on each line will be
# replaced with the specified value.
# <b><tt>:strip</tt></b>:: When setting a +true+ value, CSV will
# strip " \t\f\v" around the values.
# If you specify a string instead of
# +true+, CSV will strip string. The
# length of the string must be 1.
#
# See CSV::DEFAULT_OPTIONS for the default settings.
# See {Options for Parsing}[#class-CSV-label-Options+for+Parsing]
# and {Options for Generating}[#class-CSV-label-Options+for+Generating].
#
# Options cannot be overridden in the instance methods for performance reasons,
# so be sure to set what you want here.