Update caching docs now that we're on the CDN (#10967)

* Update caching docs now that we're on the CDN

* Reformat

* Typos
This commit is contained in:
Mathieu Pillard 2021-10-15 17:39:13 +02:00 коммит произвёл GitHub
Родитель 19569111c6
Коммит d45bb4592d
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
1 изменённых файлов: 11 добавлений и 26 удалений

Просмотреть файл

@ -2,7 +2,7 @@
## Behavior
We currently use Nginx to cache HTTP requests. It works like a standard reverse proxy: when a request comes in, a cache key is generated from various information sent by the client, a cached response is returned if one is found, otherwise the request is forwarded to the origin and returned to the client while being put in cache if appropriate.
We currently have AMO behind a CDN, which caches HTTP requests. It works like a standard reverse proxy: when a request comes in, a cache key is generated from various information sent by the client, a cached response is returned if one is found, otherwise the request is forwarded to the origin and returned to the client while being put in cache if appropriate.
For the cache key, we take into account the following parameters:
@ -12,44 +12,29 @@ For the cache key, we take into account the following parameters:
- `Accept-Encoding`
- `User-Agent`
- `DNT`
- The following cookies, extracted from the `Cookie` HTTP header:
- `frontend_auth_token`
- `frontend_active_experiments`
- `sessionid`
Caching is bypassed if a request comes in with any of the following:
If a response is found in the cache with the key, it's returned and the request never reaches the origin server.
- `frontend_auth_token` cookie (extracted from the Cookie header)
- `disable_caching` query parameter in the URL
If a response is not found in the cache, the request is forwarded to the origin server, and if the response returned by the origin server contains a `Cache-Control: s-maxage=<value>` header, it's cached using the same logic to determine the key described above. The duration of the cache is the value of that header.
If a response is found in the cache with the key, and it's returned and the request never reaches the origin server.
Behind the scenes the cache key is generated with a mix of hardcoded CDN configuration and HTTP headers returned in the `Vary` header(s) in the response. It might include more headers depending on the page, for instance pages doing `Accept-Language` detection add that header to the key automatically by adding it to the `Vary` header in the response).
If a response is not found in the cache, the request is forwarded to the origin server, and if the response returned by the origin server contains a `X-Accel-Expire`s header, it's cached using the same logic to determine the key described above. The duration of the cache is the value of that header.
Behind the scenes the cache key is generated with a mix of hardcoded nginx configuration and HTTP headers returned in the `Vary` header(s) in the response. It might include more headers depending on the page, for instance pages doing `Accept-Language` detection add that header to the key automatically by adding it to the Vary header in the response).
The origin will send a `X-Accel-Expires` header (causing nginx to cache response) on all responses unless the request came in with a `frontend_auth_token` or the response being generated is a 40x or 50x. On top of that header, a `Cache-Control: max-age=0` is sent by default (so browsers never cache the responses, to deal with authentication and back/forward cache interaction), and if the response is cacheable, a `s-maxage=180` is added, telling shared caches it's ok to cache the response.
The origin will send a `Cache-Control: s-maxage=<value>` header (causing the CDN to cache the response) on all responses unless the request came in with a `frontend_auth_token` or the response being generated is a 40x or 50x. On top of that, a `Cache-Control: max-age=0` is sent by default so browsers themselves never cache the responses, to deal with authentication and back/forward cache interaction.
## Additional considerations
### Cookies in requests
Caching is bypassed for some cookies as described above, but this is achieved without `Vary`, because we can't `Vary` on a specific cookie, only the whole header, which would include all cookies ever set on the AMO domain, including analytics - so we would likely see an extremely poor cache hit ratio if we did that.
### Cookies in responses
In addition to what's described above, Nginx is currently configured to never cache a response containing the `Set-Cookie` header. This is a safety measure that we could deactivate if needed. The value of the `Set-Cookie` header wouldn't affect the cache key.
We can't `Vary` on a specific cookie, only the whole header, which would include all cookies ever set on the AMO domain, including analytics - so we would likely see an extremely poor cache hit ratio if we did that. Therefore, cookies that affect the CDN cache are hardcoded in the CDN configuration for a given path pattern. This allows us to cache differently based on the value of `frontend_active_experiments` for instance, but any other cookie not specified in that configuration will be ignored for caching purposes. If a request comes in with a `foo=bar` cookie, it could be served the same response from cache as someone coming in without it.
### Cache duration
We currently return `180` as the number of seconds to cache responses. Nginx is set to return potentially stale responses while it's populating the cache, so sometimes clients might see a cached response that is a bit older than that.
We currently return `180` as the number of seconds to cache responses. The CDN might potentially serve stale responses while it's populating the cache, so sometimes clients might see a cached response that is a bit older than that.
### API
The API also has a similar caching layer, using a different set of parameters for the cache key: `User-Agent` and `DNT` are ignored, `Origin` is used instead, `frontend_active_experiments` cookie is ignored, and the cache is bypassed for requests coming in with a `sessionid` cookie or `Authorization` header instead of the `frontend_auth_token` cookie.
## Future move to a CDN
Once the plans to move the main AMO domain to a CDN are implemented, CloudFront will replace nginx for the caching layer, but the core principles will remain the same.
There are a couple implementation differences:
- It uses `Cache-Control` header with a `s-maxage` or `max-age` value (the former takes precedence) instead of `X-Accel-Expires`. We already return that header.
- Instead of bypassing the cache the `frontend_auth_token` will be part of the cache key, but since the origin never returns a response with a `Cache-Control` header for requests with that cookie there shouldn't be any functional differences (this is mainly due to a limitation in the way CloudFront configuration works).
- The `Set-Cookie` behavior could be re-implemented if needed but that hasn't been done yet.