Document how our search works. (#8713)
Fixes #8090 This is simply stating the facts about how search currently works. This doesn't mention any improvement ideas in the docs directly. In addition to that, personal improvement ideas: * Remove `listed_authors.name` from the scoring, make search against authors only possible via separate field queries, e.g `Search addons author:Chris` or so. This may simplify scoring against the add-on name a lot * https://github.com/mozilla/addons-server/issues/6815 (Remove prefix boosting to simplify scoring) * Maybe just use https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html instead of all our custom rules here and there? * Remove negative boosts https://github.com/mozilla/addons-server/issues/6838
This commit is contained in:
Родитель
3b1e8bfa04
Коммит
224859a641
|
@ -264,8 +264,8 @@ This endpoint allows you to fetch a specific add-on by id, slug or guid.
|
|||
|
||||
For backwards-compatibility reasons, the value for type of Theme
|
||||
currently live on production addons.mozilla.org is ``persona``
|
||||
(Lightweight Theme). ``theme`` refers to a deprecated XUL Complete Theme.
|
||||
New webextension packaged non-dynamic themes are ``statictheme`.
|
||||
(Lightweight Theme). ``theme`` refers to a deprecated XUL Complete Theme.
|
||||
New webextension packaged non-dynamic themes are ``statictheme``.
|
||||
|
||||
============== ==========================================================
|
||||
Value Description
|
||||
|
|
|
@ -42,8 +42,8 @@ Detail
|
|||
This endpoint allows you to fetch a single collection by its ``slug``.
|
||||
It returns any ``public`` collection by the specified user. You can access
|
||||
a non-``public`` collection only if it was authored by you, the authenticated user.
|
||||
If you have ``Admin:Curation`` permission you can see any collection belonging
|
||||
to the ``mozilla`` user.
|
||||
If you have ``Admin:Curation`` permission you can see any collection belonging
|
||||
to the ``mozilla`` user.
|
||||
|
||||
|
||||
.. http:get:: /api/v4/accounts/account/(int:user_id|string:username)/collections/(string:collection_slug)/
|
||||
|
|
|
@ -19,5 +19,6 @@ Development
|
|||
services
|
||||
translations
|
||||
style
|
||||
search
|
||||
docs
|
||||
../../../README.rst
|
||||
|
|
|
@ -0,0 +1,97 @@
|
|||
.. _search:
|
||||
|
||||
============================
|
||||
How does search on AMO work?
|
||||
============================
|
||||
|
||||
.. note::
|
||||
|
||||
This is documenting our current state of how search is implemented in addons-server.
|
||||
We will be using this to plan future improvements so please note that we are
|
||||
aware that the process written below is not perfect and has bugs here and there.
|
||||
|
||||
Please see https://github.com/orgs/mozilla/projects/17#card-10287357 for more planning.
|
||||
|
||||
|
||||
General structure
|
||||
=================
|
||||
|
||||
Our search contais Add-ons (``addons`` index) and Add-on compatibility report documents (``compat`` index).
|
||||
|
||||
In addition to that we store the following data:
|
||||
|
||||
* Add-on Versions (`Indexer / Serializer <https://github.com/mozilla/addons-server/blob/master/src/olympia/addons/indexers.py#L215-L237>`_)
|
||||
* Files for each Add-on Version
|
||||
* Compatibility information for each Add-on Version
|
||||
|
||||
As well as
|
||||
|
||||
* Authors
|
||||
* Previews (image links)
|
||||
* Translations (`Translations mapping generation <https://github.com/mozilla/addons-server/blob/master/src/olympia/amo/indexers.py#L40-L136>`_)
|
||||
|
||||
And various other add-on related properties. See the `Add-on Indexer / Serializer <https://github.com/mozilla/addons-server/blob/master/src/olympia/addons/indexers.py#L215-L237>`_ for more details.
|
||||
|
||||
Our search can be reached either via the API through :ref:`/api/v4/addons/search/ <addon-search>` or :ref:`/api/v4/addons/autocomplete/ <addon-autocomplete>` which are used by our addons-frontend as well as via our legacy pages (used much less).
|
||||
|
||||
Both use similar filtering and scoring code. For legacy reasons they're not identical though, we should try to focus on our API-based search though since the legacy search will be removed once support for Thunderbird and Seamonkey will be moved to a new platform.
|
||||
|
||||
The legacy search uses ElasticSearch to query the data and then requests the actual model objects from the database. The newer API-based search only hits ElasticSearch and uses data directly stored from ES which is much more efficient.
|
||||
|
||||
|
||||
Flow of a search query through AMO
|
||||
==================================
|
||||
|
||||
Let's assume we search on addons-frontend (not legacy) the search query hits the API and get's handled by ``AddonSearchView`` which directly queries ElasticSearch and doesn't involve the database at all.
|
||||
|
||||
There are a few filters that are described in the :ref:`/api/v4/addons/search/ docs <addon-search>` but most of them are not very relevant for raw search queries. Examples are filters by guid, platform, category, add-on type or appversion.
|
||||
|
||||
Much more relevant for raw add-on searches (And this is primarily used when you use the search on the frontend) is ``SearchQueryFilter``.
|
||||
|
||||
It composes various rules to define a more or less usable ranking:
|
||||
|
||||
Primary rules
|
||||
-------------
|
||||
|
||||
These are the ones using the strongest boosts, so they are only applied
|
||||
to a specific set of fields like the name, the slug and authors.
|
||||
|
||||
**Applied rules** (merged via ``should``):
|
||||
|
||||
1. Prefer ``term`` matches on ``name.raw`` (``boost=100``) - our attempt to implement exact matches
|
||||
2. Prefer phrase matches that allows swapped terms (``boost=4``, ``slop=1``)
|
||||
3. If a query is < 20 characters long, try to do a fuzzy match on the search query (``boost=2``, ``prefix_length=4``, ``fuzziness=AUTO``)
|
||||
4. Then text matches, using the standard text analyzer (``boost=3``, ``analyzer=standard``, ``operator=and``)
|
||||
5. Then text matches, using a language specific analyzer (``boost=2.5``)
|
||||
6. Then look for the query as a prefix (``boost=1.5``)
|
||||
7. If we have a matching analyzer, add a query to ``name_l10n_{LANG}`` (``boost=2.5``, ``operator=and``)
|
||||
|
||||
Rules 4, 5 and 6 are added for ``name`` and ``listed_authors.name``.
|
||||
|
||||
|
||||
Secondary rules
|
||||
---------------
|
||||
|
||||
These are the ones using the weakest boosts, they are applied to fields
|
||||
containing more text like description, summary and tags.
|
||||
|
||||
**Applied rules** (merged via ``should``):
|
||||
|
||||
1. Look for phrase matches inside the summary (``boost=0.8``)
|
||||
2. Look for phrase matches inside the summary using language specific
|
||||
analyzer (``boost=0.6``)
|
||||
3. Look for phrase matches inside the description (``boost=0.3``).
|
||||
4. Look for phrase matches inside the description using language
|
||||
specific analyzer (``boost=0.6``).
|
||||
5. Look for matches inside tags (``boost=0.1``).
|
||||
6. Append a separate ``match`` query for every word to boost tag matches (``boost=0.1``)
|
||||
|
||||
|
||||
General query flow:
|
||||
-------------------
|
||||
|
||||
1. Fetch current translation
|
||||
2. Fetch locale specific analyzer (`List of analyzers <https://github.com/mozilla/addons-server/blob/master/src/olympia/constants/search.py#L15-L61>`_)
|
||||
3. Merge primary and secondary *should* rules
|
||||
4. Create a ``function_score`` query that uses a ``field_value_factor`` function on ``boost``
|
||||
5. Add a specific boost for webextension related add-ons
|
Загрузка…
Ссылка в новой задаче