update elasticserarch install docs

2012-06-26 15:34:34 -07:00 · 2012-06-26 15:34:34 -07:00 · 5cbcfb926b
--- a/docs/topics/install-zamboni/elasticsearch.rst
+++ b/docs/topics/install-zamboni/elasticsearch.rst
@ -4,32 +4,61 @@
 =============
 elasticsearch
 =============
+elasticsearch is a (magical black-box beast) search server that is used like a
+key-value store.

 Installation::

    brew install elasticsearch  # or whatever your package manager is called.
+    apt-get/aptitude install elasticsearch

-Launch elasticsearch.  If you used homebrew, `brew info elasticsearch`
-will show you the commands to launch.
+Launch elasticsearch. If you used homebrew, `brew info elasticsearch`
+will show you the commands to launch. If you used aptitude, elasticsearch will
+come with an start-stop daemon in /etc/init.d as well as configuration files in
+/etc/elasticsearch.
+
+Mappings::
+    ./manage.py shell_plus
+    from stats.search import setup_indexes
+    setup_indexes()
+
+    ./manage.py shell_plus --settings=your_local_mkt_settings
+    from mkt.stats.search import setup_mkt_indexes
+    setup_mkt_indexes()
+
+Setting up the mappings is similar to defining the schema for a database or
+structure of a table. For different tables, we define different mappings that
+explicitly define fields to store and their type. We can also define what
+analyzer or tokenizer ElasticSearch uses on those fields. If a field is not
+explicitly defined, ElasticSearch dynamically guesses the field's type in a
+schemaless manner.

 Indexing::

    ./manage.py cron reindex_addons  # Index all the add-ons.

 The reindex job uses celery to parallelize indexing. Running the job multiple
-times will replace old index items with a new document.
+times will replace old index items with a new document. You will want to set up
+the mappings, and run the indexing for a bit upon starting.

 The index is maintained incrementally through post_save and post_delete hooks.

 Setting up other indexes::

+    ./manage.py index_stats  # Index all the update and download counts.
+
+    ./manage.py index_mkt_stats  # Index contributions/installs/inapp-payments.
+
+    ./manage.py index_stats/index_mkt_stats --addons 12345 1234 # Index
+    specific addons/webapps.
+
    ./manage.py cron reindex_collections  # Index all the collections.

    ./manage.py cron reindex_users  # Index all the users.

    ./manage.py cron compatibility_report  # Set up the compatibility index.

-    ./manage.py index_stats  # Index all the update and download counts.
+    ./manage.py weekly_downloads # Index weekly downloads.


 Settings
@ -42,6 +71,38 @@ your elasticsearch.yml (available at

 .. literalinclude:: /../scripts/elasticsearch/elasticsearch.yml

+If you installed ElasticSearch via apt-get/aptitude, place this configuration
+file in `/etc/elasticsearch/` and restart ElasticSearch with
+`/etc/init.d/elasticsearch restart`.

 If you don't do this your results will be slightly different, but you probably
 won't notice.
+
+
+Querying ElasticSearch in Django
+--------------------------------
+Django models in zamboni are instantiated with a SearchMixin that has
+functions that communicate with ElasticSearch through elasticutils. A notable
+one is `.search()` which returns an ElasticSearch search object which acts a
+lot like Django's ORM's object manager. `.filter(**kwargs)` can be run on this
+search object. Sometimes a query returns nothing unless `.values_dict` is
+called on the query set::
+
+    query_results = list(MyModel.search().filter(
+    a_field=a_str.lower()).values_dict('that_field'))
+
+
+Common Pitfalls
+---------------
+
+*I got a CircularReference error on .search()* - check that a whole object is
+not being passed into the filters, but rather just a field's value
+
+*I indexed something into ElasticSearch, but my query returns nothing* - check
+whether the query contains upper-case letters or hyphens. If so, try
+lowercasing your query filter. For hyphens, set the field's mapping to not be
+analyzed::
+
+    'my_field': {'type': 'string', 'index': 'not_analyzed'}
+
+Also try running .values_dict on the query as mentioned above