зеркало из https://github.com/github/docs.git
refactor how archived frontmatter fallbacks work (#28170)
* refactor how archived frontmatter fallbacks work * delete no-longer used file * avoid the mention * Update lib/redirects/README.md Co-authored-by: Sarah Schneider <sarahs@users.noreply.github.com> Co-authored-by: Sarah Schneider <sarahs@users.noreply.github.com>
This commit is contained in:
Родитель
f4aea3967c
Коммит
2ebfb60b4b
|
@ -19,5 +19,5 @@ mkdir translations
|
|||
# need these legacy redirects. Only the redirects from
|
||||
# front-matter will be at play.
|
||||
# These static redirects json files are notoriously large
|
||||
echo '[]' > lib/redirects/static/archived-frontmatter-fallbacks.json
|
||||
echo '[]' > lib/redirects/static/archived-frontmatter-valid-urls.json
|
||||
echo '{}' > lib/redirects/static/archived-redirects-from-213-to-217.json
|
||||
|
|
|
@ -50,13 +50,20 @@ As a workaround for these lost redirects, we have two files in `lib/redirects/st
|
|||
|
||||
This file contains keys equal to old routes and values equal to new routes (aka snapshots of permalinks at the time) for versions 2.13 to 2.17. (The old routes were generated via `lib/redirects/get-old-paths-from-permalink.js`.)
|
||||
|
||||
* `archived-frontmatter-fallbacks.json`
|
||||
* `archived-frontmatter-valid-urls.json`
|
||||
|
||||
This file contains an array of arrays, where the child arrays are a group of all frontmatter redirects for each content file. This is essentially list of all the historical paths for the articles in old versions. The problem is, we don't know which historical paths correspond to which versions.
|
||||
This file is an object of VALID_URL to VALID_REDIRECT_SOURCES.
|
||||
E.g. `"/enterprise/2.13/foo": ["/enterprise/2.13/bar", "/enterprise/2.13/buzz"]`
|
||||
It was originally based on a previous file called `archived-frontmatter-fallbacks.json`
|
||||
which had a record of each possible redirect candidate that we should bother
|
||||
redirecting too.
|
||||
Now, this new file has been created by accurately comparing it to the actual
|
||||
content inside the `github/help-docs-archived-enterprise-versions` repo for the
|
||||
version range of 2.13 to 2.17. So every key in `archived-frontmatter-valid-urls.json`
|
||||
corresponds to a file that would work.
|
||||
|
||||
Here's how the `middleware/archived-enterprise-versions.js` fallback works: if someone tries to access an article that was updated via a now-lost frontmatter redirect (for example, an article at the path `/en/enterprise/2.15/user/articles/viewing-contributions-on-your-profile-page`), the middleware will first look for a redirect in `archived-redirects-from-213-to-217.json`. If it does not find one, it will look for a child array in `archived-frontmatter-fallbacks.json` that contains the requested path. If it finds a relevant array, it will try accessing all the other paths in the array until it finds one that returns a 200. For this example, it would successfully reach `/en/enterprise/2.15/user/articles/viewing-contributions-on-your-profile` (no `-page`).
|
||||
|
||||
This is admittedly an inefficient brute-force approach. But requests for archived docs <2.18 are getting less and less common as organizations upgrade their Enterprise instances, and all the expensive calculation happens in the middleware on page request, not on server warmup, so at least it's a relatively isolated process.
|
||||
Here's how the `middleware/archived-enterprise-versions.js` fallback works: if someone tries to access an article that was updated via a now-lost frontmatter redirect (for example, an article at the path `/en/enterprise/2.15/user/articles/viewing-contributions-on-your-profile-page`), the middleware will first look for a redirect in `archived-redirects-from-213-to-217.json`. If it does not find one, it will look for it in `archived-frontmatter-valid-urls.json` that contains the requested path. If it finds it, it will redirect to it to because that file knows exactly which URLs are valid in
|
||||
`help-docs-archived-enterprise-versions`.
|
||||
|
||||
## Tests
|
||||
|
||||
|
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -32,8 +32,8 @@ function splitByLanguage(uri) {
|
|||
const archivedRedirects = readCompressedJsonFileFallbackLazily(
|
||||
'./lib/redirects/static/archived-redirects-from-213-to-217.json'
|
||||
)
|
||||
const archivedFrontmatterFallbacks = readCompressedJsonFileFallbackLazily(
|
||||
'./lib/redirects/static/archived-frontmatter-fallbacks.json'
|
||||
const archivedFrontmatterValidURLS = readCompressedJsonFileFallbackLazily(
|
||||
'./lib/redirects/static/archived-frontmatter-valid-urls.json'
|
||||
)
|
||||
|
||||
const cacheControl = cacheControlFactory(60 * 60 * 24 * 365)
|
||||
|
@ -219,22 +219,18 @@ export default async function archivedEnterpriseVersions(req, res, next) {
|
|||
return res.send(r.body)
|
||||
}
|
||||
|
||||
for (const fallbackRedirect of getFallbackRedirects(req, requestedVersion) || []) {
|
||||
const statsTags = [`path:${req.path}`, `fallback:${fallbackRedirect}`]
|
||||
const doGet = () =>
|
||||
got(getProxyPath(fallbackRedirect, requestedVersion), {
|
||||
throwHttpErrors: false,
|
||||
retry: retryConfiguration,
|
||||
timeout: timeoutConfiguration,
|
||||
})
|
||||
|
||||
const r = await statsd.asyncTimer(doGet, 'archive_enterprise_proxy_fallback', [
|
||||
...statsdTags,
|
||||
`fallback:${fallbackRedirect}`,
|
||||
])()
|
||||
if (r.statusCode === 200) {
|
||||
cacheAggressively(res)
|
||||
// from 2.13 to 2.17, we lost access to frontmatter redirects during the archival process
|
||||
// this workaround finds potentially relevant frontmatter redirects in currently supported pages
|
||||
if (
|
||||
versionSatisfiesRange(requestedVersion, `>=${firstVersionDeprecatedOnNewSite}`) &&
|
||||
versionSatisfiesRange(requestedVersion, `<=${lastVersionWithoutArchivedRedirectsFile}`)
|
||||
) {
|
||||
const statsTags = [`path:${req.path}`]
|
||||
const fallbackRedirect = getFallbackRedirect(req)
|
||||
if (fallbackRedirect) {
|
||||
statsTags.push(`fallback:${fallbackRedirect}`)
|
||||
statsd.increment('middleware.trying_fallback_redirect_success', 1, statsTags)
|
||||
cacheAggressively(res)
|
||||
return res.redirect(redirectCode, fallbackRedirect)
|
||||
}
|
||||
statsd.increment('middleware.trying_fallback_redirect_failure', 1, statsTags)
|
||||
|
@ -254,15 +250,56 @@ function getProxyPath(reqPath, requestedVersion) {
|
|||
return `https://github.github.com/help-docs-archived-enterprise-versions${proxyPath}`
|
||||
}
|
||||
|
||||
// from 2.13 to 2.17, we lost access to frontmatter redirects during the archival process
|
||||
// this workaround finds potentially relevant frontmatter redirects in currently supported pages
|
||||
function getFallbackRedirects(req, requestedVersion) {
|
||||
if (versionSatisfiesRange(requestedVersion, `<${firstVersionDeprecatedOnNewSite}`)) return
|
||||
if (versionSatisfiesRange(requestedVersion, `>${lastVersionWithoutArchivedRedirectsFile}`)) return
|
||||
// Module-level global cache object.
|
||||
// Get's populated lazily inside getFallbackRedirect().
|
||||
const fallbackRedirectLookups = new Map()
|
||||
|
||||
// `archivedFrontmatterFallbacks` is a callable because it's a lazy function
|
||||
// and memoized so calling it is cheap.
|
||||
return archivedFrontmatterFallbacks().find((arrayOfFallbacks) =>
|
||||
arrayOfFallbacks.includes(req.path)
|
||||
)
|
||||
function getFallbackRedirect(req) {
|
||||
// The file `lib/redirects/static/archived-frontmatter-valid-urls.json` which
|
||||
// we depend on here, is structured like this:
|
||||
//
|
||||
// {
|
||||
// "/enterprise/2.13/foo/bar": [
|
||||
// "/enterprise/2.13/other/old/thing",
|
||||
// "/enterprise/2.13/more/redirectable/url",
|
||||
// "/enterprise/2.13/etc/etc"
|
||||
// ],
|
||||
// ...
|
||||
//
|
||||
// The keys are valid URLs that it can redirect to. I.e. these are
|
||||
// URLs that we definitely know are valid and will be found
|
||||
// in https://github.com/github/help-docs-archived-enterprise-versions
|
||||
// The array values are possible URLs we deem acceptable redirect
|
||||
// sources.
|
||||
// But to avoid an unnecessary, O(n), loop every time, we turn this
|
||||
// structure around to become:
|
||||
//
|
||||
// {
|
||||
// "/enterprise/2.13/other/old/thing": "/enterprise/2.13/foo/bar",
|
||||
// "/enterprise/2.13/more/redirectable/url": "/enterprise/2.13/foo/bar",
|
||||
// "/enterprise/2.13/etc/etc": "/enterprise/2.13/foo/bar",
|
||||
// ...
|
||||
//
|
||||
// Now potential lookups are fast.
|
||||
if (!fallbackRedirectLookups.size) {
|
||||
for (const [destination, sources] of Object.entries(archivedFrontmatterValidURLS())) {
|
||||
for (const source of sources) {
|
||||
fallbackRedirectLookups.set(source, destination)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// But before we proceed, remember that the
|
||||
// file lib/redirects/static/archived-frontmatter-valid-urls.json never
|
||||
// contains a language prefix.
|
||||
// E.g. only `/enterprise/2.13/foo/bar` but the requested URL can be
|
||||
// `/en/enterprise/2.13/foo/bar`, `/pt/enterprise/2.13/foo/bar`,
|
||||
// or just `/enterprise/2.13/foo/bar`.
|
||||
// Whatever it is, pop the language prefix, operate, and put it back
|
||||
// again. In the end, it always has to have a language prefix.
|
||||
const [language, withoutLanguage] = splitPathByLanguage(req.path)
|
||||
const fallback = fallbackRedirectLookups.get(withoutLanguage)
|
||||
if (fallback) {
|
||||
return `/${language}${fallback}`
|
||||
}
|
||||
}
|
||||
|
|
Загрузка…
Ссылка в новой задаче