Set REMOTE_ADDR correctly when requests are going through a CDN Shield (#22603)

* Set REMOTE_ADDR correctly when requests are going through a CDN Shield

https://www.fastly.com/documentation/guides/concepts/shielding/

CDN Shield is an additional PoP requests can go through, and their
IP is added to X-Forwarded-For so we need to look for the client IP
in a different place. The special X-AMO-Request-Shielded header is
set to "true" in that case.

* Tweak documentation about IP addresses and move them from middleware docstring to their own doc
This commit is contained in:
Mathieu Pillard 2024-08-29 15:05:10 +02:00 коммит произвёл GitHub
Родитель b9dea9f31d
Коммит 697bffd6a9
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: B5690EEEBB952194
4 изменённых файлов: 131 добавлений и 36 удалений

Просмотреть файл

@ -14,6 +14,7 @@ refers to this project.
topics/readme_include
topics/api/index
topics/development/index
topics/remote_addr
topics/third-party
topics/blocklist
```

Просмотреть файл

@ -0,0 +1,73 @@
(remote_addr)=
# IP addresses
AMO records IP addresses for various user actions, allowing us to correlate
user activity to find malicious and abusive actors.
## Processing
`SetRemoteAddrFromForwardedFor` middleware is responsible for processing the
various headers and meta information we receive and setting
`META['REMOTE_ADDR']` on the `request` object passed to each view.
The request flow is either:
Client -> CDN -> Load balancer -> WSGI proxy -> addons-server
or
Client -> CDN -> CDN shield -> Load balancer -> WSGI proxy -> addons-server
or
Client -> Load balancer -> WSGI proxy -> addons-server
Currently:
- CDN is CloudFront or Fastly
- CDN shield is an additional PoP on the CDN that request can go through
(only enabled on Fastly)
- Load Balancer is GKE Ingress (GCP)
- WSGI proxy is nginx + uwsgi
CDN is set up to add a `X-Request-Via-CDN` header set to a secret value known
to us so we can verify the request did originate from the CDN.
If the request was shielded by the CDN it sets the `X-AMO-Request-Shielded`
header to `"true"`. This header should only be trusted if `X-Request-Via-CDN`
has been verified already.
Nginx converts `X-Request-Via-CDN` and `X-Forwarded-For` to
`HTTP_X_REQUEST_VIA_CDN` and `HTTP_X_FORWARDED_FOR` parameters, respectively.
The `X-Forwarded-For` header is potentially user input. When intermediary
servers in the flow described above add their own IP to it, they are always
appending to the list, so we can only trust specific positions starting
from the right, anything else cannot be trusted.
CDN always makes origin requests with a `X-Forwarded-For` header set to
"Client IP, CDN IP", so the client IP will be second to last for a CDN
request. If the request was shielded, the shield PoP IP will be added so
the client IP will be third to last.
On GCP, GKE Ingress appends its own IP to that header, resulting
in a value of "Client IP, CDN IP, GKE Ingress IP" (or
"Client IP, CDN IP, CDN Shield IP, GKE Ingress IP" for shielded requests),
so the client IP will be third to last, or fourth to last if there was a
CDN Shield.
We are no longer hosted on AWS, but it's worth noting that on AWS, the classic
ELB we were using did not make any alterations to `X-Forwarded-For`. For this
reason, we only shift the client IP position we look at to account for the
Load Balancer if `DEPLOY_PLATFORM` environ variable is set to `"gcp"`.
If the request didn't come from the CDN and is a direct origin request, on
AWS we can use `REMOTE_ADDR`, but on GCP we'd get the GKE Ingress IP, and the
`X-Forwarded-For` value will be "Client IP, GKE Ingress IP", so the client IP
will be second to last.
## Recording
Through several older models in the code have a dedicated field for this,
more recent implementations should use `IPLog`, which is automatically
populated when an `ActivityLog` action constant is defined with `store_ip` set
to `True` (note that if the `keep` property isn't defined, we don't keep the
activity forever and therefore ultimately delete the associated `IPLog`
instance as well)

Просмотреть файл

@ -286,42 +286,7 @@ class SetRemoteAddrFromForwardedFor(MiddlewareMixin):
"""
Set REMOTE_ADDR from HTTP_X_FORWARDED_FOR if necessary.
The request flow is either:
Client -> CDN -> Load balancer -> WSGI proxy -> addons-server.
or
Client -> Load balancer -> WSGI proxy -> addons-server.
Currently:
- CDN is CloudFront
- Load Balancer is either a classic ELB (AWS) or GKE Ingress (GCP)
- WSGI proxy is nginx + uwsgi
CloudFront is set up to add a X-Request-Via-CDN header set to a secret
value known to us so we can verify the request did originate from the CDN.
Nginx converts X-Request-Via-CDN and X-Forwarded-For to
HTTP_X_REQUEST_VIA_CDN and HTTP_X_FORWARDED_FOR, respectively.
The X-Forwarded-For header is potentially user input. When intermediary
servers in the flow described above add their own IP to it, they are always
appending to the list, so we can only trust specific positions starting
from the right, anything else cannot be trusted.
CloudFront always makes origin requests with a X-Forwarded-For header
set to "Client IP, CDN IP", so the client IP will be second to last for a
CDN request.
On AWS, the classic ELB we're using does not make any alterations to
X-Forwarded-For.
On GCP, GKE Ingress appends its own IP to that header, resulting
in a value of "Client IP, CDN IP, GKE Ingress IP", so the client IP will be
third to last.
If the request didn't come from the CDN and is a direct origin request, on
AWS we can use REMOTE_ADDR, but on GCP we'd get the GKE Ingress IP, and the
X-Forwarded-For value will be "Client IP, GKE Ingress IP", so the client IP
will be second to last.
See https://mozilla.github.io/addons-server/topics/remote_addr.html.
"""
def is_request_from_cdn(self, request):
@ -332,12 +297,27 @@ class SetRemoteAddrFromForwardedFor(MiddlewareMixin):
def is_request_from_gcp_environement(self, request):
return os.environ.get('DEPLOY_PLATFORM') == 'gcp'
def is_request_shielded(self, request):
"""Return True if the request was shielded. Only use if
is_request_from_cdn() returns True.
The header this method depends on is explicitly set by Fastly and
explicitly not forwarded by CloudFront, guaranteeing it was not
provided by the client if the request is coming from either CDN.
"""
# This is a straight header, not a META parameter set by uwsgi.
return request.headers.get('x-amo-request-shielded') == 'true'
def process_request(self, request):
"""Set REMOTE_ADDR to the client IP in a specific position in
HTTP_X_FORWARDED_FOR depending on what the request went through."""
position = 1
x_forwarded_for_header = request.META.get('HTTP_X_FORWARDED_FOR')
if x_forwarded_for_header:
if self.is_request_from_cdn(request):
position += 1
if self.is_request_shielded(request):
position += 1
if self.is_request_from_gcp_environement(request):
position += 1
# If position is greater than 1 then we need to fix REMOTE_ADDR by

Просмотреть файл

@ -258,6 +258,47 @@ class TestSetRemoteAddrFromForwardedFor(TestCase):
self.middleware.process_request(request)
assert request.META['REMOTE_ADDR'] == '2.3.4.2'
@override_settings(SECRET_CDN_TOKEN='foo')
@patch.dict(os.environ, {'DEPLOY_PLATFORM': 'gcp'})
def test_request_from_cdn_with_shield(self):
request = RequestFactory().get(
'/',
REMOTE_ADDR='1.1.1.1',
HTTP_X_FORWARDED_FOR='7.7.7.7, 2.3.4.2, 2.2.2.2, 1.1.1.1, 4.8.15.16',
HTTP_X_REQUEST_VIA_CDN='foo',
headers={'X-AMO-Request-Shielded': 'true'},
)
assert self.middleware.is_request_shielded(request)
self.middleware.process_request(request)
assert request.META['REMOTE_ADDR'] == '2.3.4.2'
@override_settings(SECRET_CDN_TOKEN='foo')
@patch.dict(os.environ, {'DEPLOY_PLATFORM': 'gcp'})
def test_request_from_cdn_without_shield(self):
# Shield header can be explicitly set to "false" instead of "true", we
# just ignore it in that case.
request = RequestFactory().get(
'/',
REMOTE_ADDR='1.1.1.1',
HTTP_X_FORWARDED_FOR='7.7.7.7, 2.3.4.2, 1.1.1.1, 4.8.15.16',
HTTP_X_REQUEST_VIA_CDN='foo',
headers={'X-AMO-Request-Shielded': 'false'},
)
assert not self.middleware.is_request_shielded(request)
self.middleware.process_request(request)
assert request.META['REMOTE_ADDR'] == '2.3.4.2'
@patch.dict(os.environ, {'DEPLOY_PLATFORM': 'gcp'})
def test_request_not_from_cdn_should_ignore_shield_header(self):
request = RequestFactory().get(
'/',
REMOTE_ADDR='1.1.1.1',
HTTP_X_FORWARDED_FOR='7.7.7.7, 2.3.4.2, 4.8.15.16',
HTTP_X_REQUEST_VIA_CDN='foo',
)
self.middleware.process_request(request)
assert request.META['REMOTE_ADDR'] == '2.3.4.2'
class TestCacheControlMiddleware(TestCase):
def setUp(self):