Remove source expression deduplication. (#499)

This PR removes `dedup_source_list` and replaces it with a simple
`.uniq` call. This resolves
https://github.com/github/secure_headers/issues/491, which is only the
latest in a series of ongoing issues with source expression
deduplication.

`secure_headers` has had this feature [since
2015](32bb3f51e8)
that [deduplicates redundant URL source
expressions](494b75ff92/lib/secure_headers/headers/content_security_policy.rb (L157-L170)).
For example, if `*.github.com` is listed as a source expression for a
given
[directive](https://w3c.github.io/webappsec-csp/#framework-directives),
then the addition of `example.github.com` would have no effect, and so
the latter can be safely removed by `secure_headers` to save bytes.

Unfortunately, this implementation has had various bugs due to the use
of "impedance mismatched" APIs like
[`URI`](https://docs.ruby-lang.org/en/2.1.0/URI.html)[^1] and
[`File.fnmatch`](https://apidock.com/ruby/v2_5_5/File/fnmatch/class)[^2].
For example, it made incorrect assumptions about source expression
schemes, leading to the following series of events:

[^1]: Which allows wildcards in domains but not for ports, as it is not
designed to parse URL source expressions.
[^2]: Which has general glob matching that is not designed for URL
source expressions either.

- 2017-03: A [bug was reported and
confirmed](https://github.com/github/secure_headers/issues/317)
- 2022-04: The bug was finally [fixed by `@keithamus` (a Hubber) in
2022](https://github.com/github/secure_headers/pull/478) due to our use
of web sockets.
- 2022-06: This fix in turn triggered a [new
bug](https://github.com/github/secure_headers/issues/491) with source
expressions like `data:`.
- 2022-06: An external contributor [submitted a fix for the bew
bug](https://github.com/github/secure_headers/pull/490), but this still
doesn't address some of the "fast and loose" semantic issues of the
underlying implementation.
- 2022-08: `@lgarron` [drafted a new
implementation](https://github.com/github/secure_headers/pull/498) that
semantically parses and compares source expressions based on the
specification for source expressions.
- This implementation already proved to have some value in early
testing, as its stricter validation caught an issue in `github.com`'s
CSP. However, it would take additional work to make this implementation
fully aware of CSP syntax (e.g. not allowing URL source expressions in a
source directive when only special keywords are allowed, and
vice-versa), and it relies on a new regex-based implementation of source
expression parsing that may very well lead to more subtle bugs.

In effect, this is a half feature whose maintenance cost has outweighed
its functionality:

- The relevant code has suffered from continued bugs, described as
above.
- Deduplication is purely a "nice-to-have" — it is not necessary for the
security or correct functionality of `secure_headers`.
- It was [introduced by `@oreoshake` (the then-maintainer) without
explanation in
2015](32bb3f51e8),
never "officially" documented. We have no concrete data on whether it
has any performance impact on any real apps — for all we know, uncached
deduplication calculations might even cost more than the saved header
bytes.
- Further, in response to the first relevant bug, `@oreoshake` himself
[said](https://github.com/github/secure_headers/issues/317#issuecomment-283431124):

> I've never been a fan of the deduplication based on `*` anyways. Maybe
we should just rip that out.

> Like people trying to save a few bytes can optimize elsewhere.

So this PR completely removes the functionality. If we learn of a use
case where this was very important (and the app somehow can't preprocess
the list before passing it to `secure_headers`), we can always resume
consideration of one of:

- https://github.com/github/secure_headers/pull/490
- https://github.com/github/secure_headers/pull/498
This commit is contained in:
Lucas Garron 2022-10-24 12:03:34 -07:00 коммит произвёл GitHub
Родитель a488b4d5ec cd73b02188
Коммит b6ef2ed67a
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
2 изменённых файлов: 10 добавлений и 23 удалений

Просмотреть файл

@ -133,7 +133,7 @@ module SecureHeaders
unless directive == REPORT_URI || @preserve_schemes
source_list = strip_source_schemes(source_list)
end
dedup_source_list(source_list)
source_list.uniq
end
end
@ -151,24 +151,6 @@ module SecureHeaders
end
end
# Removes duplicates and sources that already match an existing wild card.
#
# e.g. *.github.com asdf.github.com becomes *.github.com
def dedup_source_list(sources)
sources = sources.uniq
wild_sources = sources.select { |source| source =~ STAR_REGEXP }
if wild_sources.any?
schemes = sources.map { |source| [source, URI(source).scheme] }.to_h
sources.reject do |source|
!wild_sources.include?(source) &&
wild_sources.any? { |pattern| schemes[pattern] == schemes[source] && File.fnmatch(pattern, source) }
end
else
sources
end
end
# Private: append a nonce to the script/style directories if script_nonce
# or style_nonce are provided.
def populate_nonces(directive, source_list)

Просмотреть файл

@ -48,12 +48,12 @@ module SecureHeaders
expect(csp.value).to eq("default-src * 'unsafe-inline' 'unsafe-eval' data: blob:")
end
it "minifies source expressions based on overlapping wildcards" do
it "does not minify source expressions based on overlapping wildcards" do
config = {
default_src: %w(a.example.org b.example.org *.example.org https://*.example.org)
}
csp = ContentSecurityPolicy.new(config)
expect(csp.value).to eq("default-src *.example.org")
expect(csp.value).to eq("default-src a.example.org b.example.org *.example.org")
end
it "removes http/s schemes from hosts" do
@ -101,8 +101,13 @@ module SecureHeaders
expect(csp.value).to eq("default-src example.org; block-all-mixed-content")
end
it "deduplicates any source expressions" do
csp = ContentSecurityPolicy.new(default_src: %w(example.org example.org example.org))
it "handles wildcard subdomain with wildcard port" do
csp = ContentSecurityPolicy.new(default_src: %w(https://*.example.org:*))
expect(csp.value).to eq("default-src *.example.org:*")
end
it "deduplicates source expressions that match exactly (after scheme stripping)" do
csp = ContentSecurityPolicy.new(default_src: %w(example.org https://example.org example.org))
expect(csp.value).to eq("default-src example.org")
end