Set REMOTE_ADDR correctly when requests are going through a CDN Shield (#22603)
* Set REMOTE_ADDR correctly when requests are going through a CDN Shield https://www.fastly.com/documentation/guides/concepts/shielding/ CDN Shield is an additional PoP requests can go through, and their IP is added to X-Forwarded-For so we need to look for the client IP in a different place. The special X-AMO-Request-Shielded header is set to "true" in that case. * Tweak documentation about IP addresses and move them from middleware docstring to their own doc
This commit is contained in:
Родитель
b9dea9f31d
Коммит
697bffd6a9
|
@ -14,6 +14,7 @@ refers to this project.
|
|||
topics/readme_include
|
||||
topics/api/index
|
||||
topics/development/index
|
||||
topics/remote_addr
|
||||
topics/third-party
|
||||
topics/blocklist
|
||||
```
|
||||
|
|
|
@ -0,0 +1,73 @@
|
|||
(remote_addr)=
|
||||
|
||||
# IP addresses
|
||||
|
||||
AMO records IP addresses for various user actions, allowing us to correlate
|
||||
user activity to find malicious and abusive actors.
|
||||
|
||||
## Processing
|
||||
|
||||
`SetRemoteAddrFromForwardedFor` middleware is responsible for processing the
|
||||
various headers and meta information we receive and setting
|
||||
`META['REMOTE_ADDR']` on the `request` object passed to each view.
|
||||
|
||||
The request flow is either:
|
||||
Client -> CDN -> Load balancer -> WSGI proxy -> addons-server
|
||||
or
|
||||
Client -> CDN -> CDN shield -> Load balancer -> WSGI proxy -> addons-server
|
||||
or
|
||||
Client -> Load balancer -> WSGI proxy -> addons-server
|
||||
|
||||
Currently:
|
||||
- CDN is CloudFront or Fastly
|
||||
- CDN shield is an additional PoP on the CDN that request can go through
|
||||
(only enabled on Fastly)
|
||||
- Load Balancer is GKE Ingress (GCP)
|
||||
- WSGI proxy is nginx + uwsgi
|
||||
|
||||
CDN is set up to add a `X-Request-Via-CDN` header set to a secret value known
|
||||
to us so we can verify the request did originate from the CDN.
|
||||
|
||||
If the request was shielded by the CDN it sets the `X-AMO-Request-Shielded`
|
||||
header to `"true"`. This header should only be trusted if `X-Request-Via-CDN`
|
||||
has been verified already.
|
||||
|
||||
Nginx converts `X-Request-Via-CDN` and `X-Forwarded-For` to
|
||||
`HTTP_X_REQUEST_VIA_CDN` and `HTTP_X_FORWARDED_FOR` parameters, respectively.
|
||||
|
||||
The `X-Forwarded-For` header is potentially user input. When intermediary
|
||||
servers in the flow described above add their own IP to it, they are always
|
||||
appending to the list, so we can only trust specific positions starting
|
||||
from the right, anything else cannot be trusted.
|
||||
|
||||
CDN always makes origin requests with a `X-Forwarded-For` header set to
|
||||
"Client IP, CDN IP", so the client IP will be second to last for a CDN
|
||||
request. If the request was shielded, the shield PoP IP will be added so
|
||||
the client IP will be third to last.
|
||||
|
||||
On GCP, GKE Ingress appends its own IP to that header, resulting
|
||||
in a value of "Client IP, CDN IP, GKE Ingress IP" (or
|
||||
"Client IP, CDN IP, CDN Shield IP, GKE Ingress IP" for shielded requests),
|
||||
so the client IP will be third to last, or fourth to last if there was a
|
||||
CDN Shield.
|
||||
|
||||
We are no longer hosted on AWS, but it's worth noting that on AWS, the classic
|
||||
ELB we were using did not make any alterations to `X-Forwarded-For`. For this
|
||||
reason, we only shift the client IP position we look at to account for the
|
||||
Load Balancer if `DEPLOY_PLATFORM` environ variable is set to `"gcp"`.
|
||||
|
||||
If the request didn't come from the CDN and is a direct origin request, on
|
||||
AWS we can use `REMOTE_ADDR`, but on GCP we'd get the GKE Ingress IP, and the
|
||||
`X-Forwarded-For` value will be "Client IP, GKE Ingress IP", so the client IP
|
||||
will be second to last.
|
||||
|
||||
## Recording
|
||||
|
||||
Through several older models in the code have a dedicated field for this,
|
||||
more recent implementations should use `IPLog`, which is automatically
|
||||
populated when an `ActivityLog` action constant is defined with `store_ip` set
|
||||
to `True` (note that if the `keep` property isn't defined, we don't keep the
|
||||
activity forever and therefore ultimately delete the associated `IPLog`
|
||||
instance as well)
|
||||
|
||||
|
|
@ -286,42 +286,7 @@ class SetRemoteAddrFromForwardedFor(MiddlewareMixin):
|
|||
"""
|
||||
Set REMOTE_ADDR from HTTP_X_FORWARDED_FOR if necessary.
|
||||
|
||||
The request flow is either:
|
||||
Client -> CDN -> Load balancer -> WSGI proxy -> addons-server.
|
||||
or
|
||||
Client -> Load balancer -> WSGI proxy -> addons-server.
|
||||
|
||||
Currently:
|
||||
- CDN is CloudFront
|
||||
- Load Balancer is either a classic ELB (AWS) or GKE Ingress (GCP)
|
||||
- WSGI proxy is nginx + uwsgi
|
||||
|
||||
CloudFront is set up to add a X-Request-Via-CDN header set to a secret
|
||||
value known to us so we can verify the request did originate from the CDN.
|
||||
|
||||
Nginx converts X-Request-Via-CDN and X-Forwarded-For to
|
||||
HTTP_X_REQUEST_VIA_CDN and HTTP_X_FORWARDED_FOR, respectively.
|
||||
|
||||
The X-Forwarded-For header is potentially user input. When intermediary
|
||||
servers in the flow described above add their own IP to it, they are always
|
||||
appending to the list, so we can only trust specific positions starting
|
||||
from the right, anything else cannot be trusted.
|
||||
|
||||
CloudFront always makes origin requests with a X-Forwarded-For header
|
||||
set to "Client IP, CDN IP", so the client IP will be second to last for a
|
||||
CDN request.
|
||||
|
||||
On AWS, the classic ELB we're using does not make any alterations to
|
||||
X-Forwarded-For.
|
||||
|
||||
On GCP, GKE Ingress appends its own IP to that header, resulting
|
||||
in a value of "Client IP, CDN IP, GKE Ingress IP", so the client IP will be
|
||||
third to last.
|
||||
|
||||
If the request didn't come from the CDN and is a direct origin request, on
|
||||
AWS we can use REMOTE_ADDR, but on GCP we'd get the GKE Ingress IP, and the
|
||||
X-Forwarded-For value will be "Client IP, GKE Ingress IP", so the client IP
|
||||
will be second to last.
|
||||
See https://mozilla.github.io/addons-server/topics/remote_addr.html.
|
||||
"""
|
||||
|
||||
def is_request_from_cdn(self, request):
|
||||
|
@ -332,12 +297,27 @@ class SetRemoteAddrFromForwardedFor(MiddlewareMixin):
|
|||
def is_request_from_gcp_environement(self, request):
|
||||
return os.environ.get('DEPLOY_PLATFORM') == 'gcp'
|
||||
|
||||
def is_request_shielded(self, request):
|
||||
"""Return True if the request was shielded. Only use if
|
||||
is_request_from_cdn() returns True.
|
||||
|
||||
The header this method depends on is explicitly set by Fastly and
|
||||
explicitly not forwarded by CloudFront, guaranteeing it was not
|
||||
provided by the client if the request is coming from either CDN.
|
||||
"""
|
||||
# This is a straight header, not a META parameter set by uwsgi.
|
||||
return request.headers.get('x-amo-request-shielded') == 'true'
|
||||
|
||||
def process_request(self, request):
|
||||
"""Set REMOTE_ADDR to the client IP in a specific position in
|
||||
HTTP_X_FORWARDED_FOR depending on what the request went through."""
|
||||
position = 1
|
||||
x_forwarded_for_header = request.META.get('HTTP_X_FORWARDED_FOR')
|
||||
if x_forwarded_for_header:
|
||||
if self.is_request_from_cdn(request):
|
||||
position += 1
|
||||
if self.is_request_shielded(request):
|
||||
position += 1
|
||||
if self.is_request_from_gcp_environement(request):
|
||||
position += 1
|
||||
# If position is greater than 1 then we need to fix REMOTE_ADDR by
|
||||
|
|
|
@ -258,6 +258,47 @@ class TestSetRemoteAddrFromForwardedFor(TestCase):
|
|||
self.middleware.process_request(request)
|
||||
assert request.META['REMOTE_ADDR'] == '2.3.4.2'
|
||||
|
||||
@override_settings(SECRET_CDN_TOKEN='foo')
|
||||
@patch.dict(os.environ, {'DEPLOY_PLATFORM': 'gcp'})
|
||||
def test_request_from_cdn_with_shield(self):
|
||||
request = RequestFactory().get(
|
||||
'/',
|
||||
REMOTE_ADDR='1.1.1.1',
|
||||
HTTP_X_FORWARDED_FOR='7.7.7.7, 2.3.4.2, 2.2.2.2, 1.1.1.1, 4.8.15.16',
|
||||
HTTP_X_REQUEST_VIA_CDN='foo',
|
||||
headers={'X-AMO-Request-Shielded': 'true'},
|
||||
)
|
||||
assert self.middleware.is_request_shielded(request)
|
||||
self.middleware.process_request(request)
|
||||
assert request.META['REMOTE_ADDR'] == '2.3.4.2'
|
||||
|
||||
@override_settings(SECRET_CDN_TOKEN='foo')
|
||||
@patch.dict(os.environ, {'DEPLOY_PLATFORM': 'gcp'})
|
||||
def test_request_from_cdn_without_shield(self):
|
||||
# Shield header can be explicitly set to "false" instead of "true", we
|
||||
# just ignore it in that case.
|
||||
request = RequestFactory().get(
|
||||
'/',
|
||||
REMOTE_ADDR='1.1.1.1',
|
||||
HTTP_X_FORWARDED_FOR='7.7.7.7, 2.3.4.2, 1.1.1.1, 4.8.15.16',
|
||||
HTTP_X_REQUEST_VIA_CDN='foo',
|
||||
headers={'X-AMO-Request-Shielded': 'false'},
|
||||
)
|
||||
assert not self.middleware.is_request_shielded(request)
|
||||
self.middleware.process_request(request)
|
||||
assert request.META['REMOTE_ADDR'] == '2.3.4.2'
|
||||
|
||||
@patch.dict(os.environ, {'DEPLOY_PLATFORM': 'gcp'})
|
||||
def test_request_not_from_cdn_should_ignore_shield_header(self):
|
||||
request = RequestFactory().get(
|
||||
'/',
|
||||
REMOTE_ADDR='1.1.1.1',
|
||||
HTTP_X_FORWARDED_FOR='7.7.7.7, 2.3.4.2, 4.8.15.16',
|
||||
HTTP_X_REQUEST_VIA_CDN='foo',
|
||||
)
|
||||
self.middleware.process_request(request)
|
||||
assert request.META['REMOTE_ADDR'] == '2.3.4.2'
|
||||
|
||||
|
||||
class TestCacheControlMiddleware(TestCase):
|
||||
def setUp(self):
|
||||
|
|
Загрузка…
Ссылка в новой задаче