This PR removes `dedup_source_list` and replaces it with a simple
`.uniq` call. This resolves
https://github.com/github/secure_headers/issues/491, which is only the
latest in a series of ongoing issues with source expression
deduplication.
`secure_headers` has had this feature [since
2015](32bb3f51e8)
that [deduplicates redundant URL source
expressions](494b75ff92/lib/secure_headers/headers/content_security_policy.rb (L157-L170)).
For example, if `*.github.com` is listed as a source expression for a
given
[directive](https://w3c.github.io/webappsec-csp/#framework-directives),
then the addition of `example.github.com` would have no effect, and so
the latter can be safely removed by `secure_headers` to save bytes.
Unfortunately, this implementation has had various bugs due to the use
of "impedance mismatched" APIs like
[`URI`](https://docs.ruby-lang.org/en/2.1.0/URI.html)[^1] and
[`File.fnmatch`](https://apidock.com/ruby/v2_5_5/File/fnmatch/class)[^2].
For example, it made incorrect assumptions about source expression
schemes, leading to the following series of events:
[^1]: Which allows wildcards in domains but not for ports, as it is not
designed to parse URL source expressions.
[^2]: Which has general glob matching that is not designed for URL
source expressions either.
- 2017-03: A [bug was reported and
confirmed](https://github.com/github/secure_headers/issues/317)
- 2022-04: The bug was finally [fixed by `@keithamus` (a Hubber) in
2022](https://github.com/github/secure_headers/pull/478) due to our use
of web sockets.
- 2022-06: This fix in turn triggered a [new
bug](https://github.com/github/secure_headers/issues/491) with source
expressions like `data:`.
- 2022-06: An external contributor [submitted a fix for the bew
bug](https://github.com/github/secure_headers/pull/490), but this still
doesn't address some of the "fast and loose" semantic issues of the
underlying implementation.
- 2022-08: `@lgarron` [drafted a new
implementation](https://github.com/github/secure_headers/pull/498) that
semantically parses and compares source expressions based on the
specification for source expressions.
- This implementation already proved to have some value in early
testing, as its stricter validation caught an issue in `github.com`'s
CSP. However, it would take additional work to make this implementation
fully aware of CSP syntax (e.g. not allowing URL source expressions in a
source directive when only special keywords are allowed, and
vice-versa), and it relies on a new regex-based implementation of source
expression parsing that may very well lead to more subtle bugs.
In effect, this is a half feature whose maintenance cost has outweighed
its functionality:
- The relevant code has suffered from continued bugs, described as
above.
- Deduplication is purely a "nice-to-have" — it is not necessary for the
security or correct functionality of `secure_headers`.
- It was [introduced by `@oreoshake` (the then-maintainer) without
explanation in
2015](32bb3f51e8),
never "officially" documented. We have no concrete data on whether it
has any performance impact on any real apps — for all we know, uncached
deduplication calculations might even cost more than the saved header
bytes.
- Further, in response to the first relevant bug, `@oreoshake` himself
[said](https://github.com/github/secure_headers/issues/317#issuecomment-283431124):
> I've never been a fan of the deduplication based on `*` anyways. Maybe
we should just rip that out.
> Like people trying to save a few bytes can optimize elsewhere.
So this PR completely removes the functionality. If we learn of a use
case where this was very important (and the app somehow can't preprocess
the list before passing it to `secure_headers`), we can always resume
consideration of one of:
- https://github.com/github/secure_headers/pull/490
- https://github.com/github/secure_headers/pull/498
In 5638cb0334 the gem was relicensed to
MIT, but it was incomplete. 86c762aea4 at
least fixed the README, but the gemspec itself was still forgotten.
Fixes: 5638cb0334
The helper outputs the closing </script> or </style> unindented
if it is on its own line. Needed to remove any extra whitespace
the regex captures on that last line.