update elasticserarch install docs

This commit is contained in:
Kevin Ngo 2012-06-26 15:34:34 -07:00
Родитель 9d9e2c8ba8
Коммит 5cbcfb926b
1 изменённых файлов: 65 добавлений и 4 удалений

Просмотреть файл

@ -4,32 +4,61 @@
=============
elasticsearch
=============
elasticsearch is a (magical black-box beast) search server that is used like a
key-value store.
Installation::
brew install elasticsearch # or whatever your package manager is called.
apt-get/aptitude install elasticsearch
Launch elasticsearch. If you used homebrew, `brew info elasticsearch`
will show you the commands to launch.
Launch elasticsearch. If you used homebrew, `brew info elasticsearch`
will show you the commands to launch. If you used aptitude, elasticsearch will
come with an start-stop daemon in /etc/init.d as well as configuration files in
/etc/elasticsearch.
Mappings::
./manage.py shell_plus
from stats.search import setup_indexes
setup_indexes()
./manage.py shell_plus --settings=your_local_mkt_settings
from mkt.stats.search import setup_mkt_indexes
setup_mkt_indexes()
Setting up the mappings is similar to defining the schema for a database or
structure of a table. For different tables, we define different mappings that
explicitly define fields to store and their type. We can also define what
analyzer or tokenizer ElasticSearch uses on those fields. If a field is not
explicitly defined, ElasticSearch dynamically guesses the field's type in a
schemaless manner.
Indexing::
./manage.py cron reindex_addons # Index all the add-ons.
The reindex job uses celery to parallelize indexing. Running the job multiple
times will replace old index items with a new document.
times will replace old index items with a new document. You will want to set up
the mappings, and run the indexing for a bit upon starting.
The index is maintained incrementally through post_save and post_delete hooks.
Setting up other indexes::
./manage.py index_stats # Index all the update and download counts.
./manage.py index_mkt_stats # Index contributions/installs/inapp-payments.
./manage.py index_stats/index_mkt_stats --addons 12345 1234 # Index
specific addons/webapps.
./manage.py cron reindex_collections # Index all the collections.
./manage.py cron reindex_users # Index all the users.
./manage.py cron compatibility_report # Set up the compatibility index.
./manage.py index_stats # Index all the update and download counts.
./manage.py weekly_downloads # Index weekly downloads.
Settings
@ -42,6 +71,38 @@ your elasticsearch.yml (available at
.. literalinclude:: /../scripts/elasticsearch/elasticsearch.yml
If you installed ElasticSearch via apt-get/aptitude, place this configuration
file in `/etc/elasticsearch/` and restart ElasticSearch with
`/etc/init.d/elasticsearch restart`.
If you don't do this your results will be slightly different, but you probably
won't notice.
Querying ElasticSearch in Django
--------------------------------
Django models in zamboni are instantiated with a SearchMixin that has
functions that communicate with ElasticSearch through elasticutils. A notable
one is `.search()` which returns an ElasticSearch search object which acts a
lot like Django's ORM's object manager. `.filter(**kwargs)` can be run on this
search object. Sometimes a query returns nothing unless `.values_dict` is
called on the query set::
query_results = list(MyModel.search().filter(
a_field=a_str.lower()).values_dict('that_field'))
Common Pitfalls
---------------
*I got a CircularReference error on .search()* - check that a whole object is
not being passed into the filters, but rather just a field's value
*I indexed something into ElasticSearch, but my query returns nothing* - check
whether the query contains upper-case letters or hyphens. If so, try
lowercasing your query filter. For hyphens, set the field's mapping to not be
analyzed::
'my_field': {'type': 'string', 'index': 'not_analyzed'}
Also try running .values_dict on the query as mentioned above