[bug 721411] Document search scoring

This adds documentation for search scoring as it's currently implemented.
Additionally, it adds some minor notes about where the ES-related code is
and links to the search view where the filters are.

Also adds a link to elasticsearch-head.
This commit is contained in:
Will Kahn-Greene 2012-02-06 19:19:25 -05:00
Родитель d477bb7436
Коммит 1afcc2fc91
1 изменённых файлов: 141 добавлений и 5 удалений

Просмотреть файл

@ -22,11 +22,11 @@ search or Google's site search.
.. Note:: .. Note::
Right now we're rewriting our search system to use Elastic and Right now we're rewriting our search system to use Elastic Search
switching between Sphinx and Elastic. At some point, the results and switching between Sphinx and Elastic Search. At some point,
we're getting with our Elastic-based code will be good enough to the results we're getting with our Elastic Search-based code will
switch over. At that point, we'll remove the Sphinx-based search be good enough to switch over. At that point, we'll remove the
code. Sphinx-based search code.
Until then, we have instructions for installing both Sphinx Search Until then, we have instructions for installing both Sphinx Search
and Elastic Search. and Elastic Search.
@ -281,3 +281,139 @@ You can see Elastic Search statistics/health with::
The last few lines tell you how many documents are in the index by The last few lines tell you how many documents are in the index by
doctype. I use this to make sure I've got stuff in my index. doctype. I use this to make sure I've got stuff in my index.
Tools
-----
One tool that's helpful for Elastic Search work is `elasticsearch-head
<https://github.com/mobz/elasticsearch-head>`_. It's like the
phpmyadmin for Elastic Search.
Implementation details
----------------------
Kitsune uses `elasticutils
<https://github.com/davedash/elasticutils>`_ and `pyes
<https://github.com/aparo/pyes>`_.
Most of our code is in the ``search`` app in ``apps/search/``.
Models in Kitsune that are indexable use ``SearchMixin`` defined in
``models.py``.
Utility functions are implemented in ``es_utils.py``.
Sub commands for ``manage.py`` are implemented in
``management/commands/``.
Search Scoring
==============
These are the defaults that apply to all searches:
kb:
query fields: title, content, summary, keywords
weights:
======== =====
name value
======== =====
title 6
content 1
keywords 4
summary 2
======== =====
questions:
query fields: title, question_content, answer_content
weights:
================ =====
name value
================ =====
title 4
question_content 3
answer_content 3
================ =====
forums:
query fields: title, content
weights:
======== =====
name value
======== =====
title 2
content 1
======== =====
.. Note::
The query fields and weights are shared between our Sphinx code and
our Elastic Search code.
Elastic Search is built on top of Lucene so the `Lucene documentation
on scoring <http://lucene.apache.org/java/3_5_0/scoring.html>`_ covers
how a document is scored in regards to the search query and its
contents. The weights modify that---they're query-level boosts.
Additionally we use a series of filters on tags, q_tags, and other
properties of the documents like has_helpful, is_locked, is_archived,
etc, In Elastic Search, filters remove items from the result set, but
don't otherwise affect the scoring.
Front page search
-----------------
A front page search is what happens when you start on the front page,
enter in a search query in the search box, and click on the green
arrow.
Front page search does the following:
1. searches only kb and questions
2. (filter) kb articles are tagged with the product (e.g. "desktop")
3. (filter) kb articles must not be archived
4. (filter) kb articles must be in Troubleshooting (10) and
How-to (20) categories
5. (filter) questions are tagged with the product (e.g. "desktop")
6. (filter) questions must have an answer marked as helpful
It scores as specified above.
Advanced search
---------------
The advanced search form lines up with the filters applied.
For example, if you search for knowledge base articles in the
Troubleshooting category, then we add a filter where the result has to
be in the Troubleshooting category.
Link to the code
----------------
Here's a link to the search view in the master branch. This is what's
on dev:
https://github.com/mozilla/kitsune/blob/master/apps/search/views.py
Here's a link to the search view in the next branch. This is what's
on staging:
https://github.com/mozilla/kitsune/blob/next/apps/search/views.py