docs/contributing/search.md

5.9 KiB

This site's search functionality is powered by Algolia, a third-party service.

To see all existing search-related issues and pull requests, visit github.com/github/docs/labels/search.


search-screenshot


How it Works

The search data is synced automatically using a GitHub Actions workflow that is triggered by pushes to the main branch. This process generates structured data for all pages on the site, compares that data to what's currently on Algolia, then adds, updates, or removes indices based on the diff of the local and remote data, being careful not to create duplicate records and avoiding any unnecessary (and costly) indexing operations.

The Actions workflow usually takes about five minutes, and the progress can be viewed (by GitHub employees) in the Actions tab of the repo.

Development

In cases where a publicity event like GitHub Satellite or GitHub Universe demands a very tight shipping window, it is also possible to manually sync the indices with Algolia's servers from your local checkout of the repo, before your feature branch is merged to main. Manually syncing the indices can also be useful to test an unreleased GitHub Enterprise version or a translated language (Portugese, Chinese, etc) that is not yet in production.

To sync the indices from your development enviroment:

  1. Make sure the two required environment variables ALGOLIA_APPLICATION_ID and ALGOLIA_API_KEY are set in your .env file. These can be retrieved from the Algolia site.
  2. Run npm run sync-search-dry-run. This takes a while to complete. It will prepare, test, and validate all the indices without actually uploading anything to Algolia's servers.
  3. Run npm run sync-search to prepare the indices again and upload them to the Algolia servers.

Files

Indices

There's a separate search index for each combination of product and language. Some examples:

Index Name Description
github-docs-dotcom-cn GitHub.com Chinese
github-docs-dotcom-en GitHub.com English
github-docs-dotcom-es GitHub.com Spanish
github-docs-dotcom-ja GitHub.com Japanese
github-docs-2.18-cn GitHub Enterprise 2.18 Chinese
github-docs-2.18-en GitHub Enterprise 2.18 English
github-docs-2.18-es GitHub Enterprise 2.18 Spanish
github-docs-2.18-ja GitHub Enterprise 2.18 Japanese
github-docs-2.17-cn GitHub Enterprise 2.17 Chinese
github-docs-2.17-en GitHub Enterprise 2.17 English
github-docs-2.17-es GitHub Enterprise 2.17 Spanish
github-docs-2.17-ja GitHub Enterprise 2.17 Japanese

Records

Each record represents a section of a page. Sections are derived by splitting up pages by their headings. Each record has a title, intro (if one exists in the frontmatter), body content (in text, not HTML), a url, and a unique objectID that is currently just the permalink of the article. Here's an example:

{
  objectID: '/en/actions/creating-actions/about-actions#about-actions',
  url: 'https://help.github.com/en/actions/creating-actions/about-actions#about-actions',
  slug: 'about-actions',
  breadcrumbs: 'GitHub Actions / Creating actions / About actions',
  heading: 'About actions',
  title: 'About actions',
  content: "You can create actions by writing custom code that interacts with your repository in any way you'd like..."
}

Notes

  • It's not strictly necessary to set an objectID as Algolia will create one automatically, but by creating our own we have a guarantee that subsequent invocations of this upload script will overwrite existing records instead of creating numerous duplicate records with differing IDs.
  • Algolia has typo tolerance. Try spelling something wrong and see what you get!
  • Algolia has lots of controls for customizing each index, so we can add weights to certain attributes and create rules like "title is more important than body", etc. But it works pretty well as-is without any configuration.
  • Algolia has support for "advanced query syntax" for exact matching of quoted expressions and exclusion of words preceded by a - sign. This is off by default but we have it enabled in our browser client. This and many other settings can be configured in Algolia.com web interface. The settings in the web interface can be overridden by the InstantSearch.js client. See javascripts/search.js.