Rationale
=========
The dotenv store as it exists right now performs splitting on newlines
to determine where a new key-value pair or comment begins. This works
remarkably well, up until you need to handle values that contain
newlines.
While I couldn't find an offical dotenv file format spec, I sampled a
number of open-source dotenv parsers and it seems that they typically
apply the following rules:
Comments:
* Comments may be written by starting a line with the `#` character.
Newline handling:
* If a value is unquoted or single-quoted and contains the character
sequence `\n` (`0x5c6e`), it IS NOT decoded to a line feed (`0x0a`).
* If a value is double-quoted and contains the character sequence `\n`
(`0x5c6e`), it IS decoded to a line feed (`0x0a`).
Whitespace trimming:
* For comments, the whitespace immediately after the `#` character and any
trailing whitespace is trimmed.
* If a value is unquoted and contains any leading or trailing whitespace, it
is trimmed.
* If a value is either single- or double-quoted and contains any leading or
trailing whitespace, it is left untrimmed.
Quotation handling:
* If a value is surrounded by single- or double-quotes, the quotation marks
are interpreted and not included in the value.
* Any number of single-quote characters may appear in a double-quoted
value, or within a single-quoted value if they are escaped (i.e.,
`'foo\'bar'`).
* Any number of double-quote characters may appear in a single-quoted
value, or within a double-quoted value if they are escaped (i.e.,
`"foo\"bar"`).
Because single- and double-quoted values may contain actual newlines,
we cannot split our input data on newlines as this may be in the middle
of a quoted value. This, along with the other rules around handling
quoted values, prompted me to try and implement a more robust parsing
solution. This commit is my first stab at that.
Special Considerations
======================
This is _not_ a backwards-compatible change:
* The `dotenv` files produced by this version of SOPS _cannot_ be read
by an earlier version.
* The `dotenv` files produced by an earlier version of SOPS _can_ be
read by this version, with the understanding that the semantics around
quotations and newlines have changed.
Examples
========
The below examples show how double-quoted values are passed to the
running environment:
```console
$ echo 'FOO="foo\\nbar\\nbaz"' > plaintext.env
$ sops -e --output ciphertext.env plaintext.env
$ sops exec-env ciphertext.env 'env | grep FOO | xxd'
00000000: 464f 4f3d 666f 6f5c 6e62 6172 5c6e 6261 FOO=foo\nbar\nba
00000010: 7a0a z.
```
```console
$ echo 'FOO="foo\nbar\nbaz"' > plaintext.env
$ sops -e --output ciphertext.env plaintext.env
$ sops exec-env ciphertext.env 'env | grep -A2 FOO | xxd'
00000000: 464f 4f3d 666f 6f0a 6261 720a 6261 7a0a FOO=foo.bar.baz.
```
When reading and writing dotenv files, we need to make sure to
encode/decode newline characters. SOPS does not currently do this, as
can be seen from the below:
```console
$ echo '{"foo": "foo\nbar\nbaz"}' > plaintext.json
$ sops -e --output ciphertext.json plaintext.json
$ sops -d --output-type dotenv ciphertext.json
foo=foo
bar
baz
```
This output, is invalid and cannot even be fed back into SOPS:
```console
$ sops -d --output-type dotenv --output plaintext.env ciphertext.json
$ sops -e plaintext.env
Error unmarshalling file: invalid dotenv input line: bar
```
This commit fixes the issue, such that the final `sops -d ...` command
above produces the correct output:
```console
$ sops -d --output-type dotenv ciphertext.json
foo=foo\nbar\nbaz
```