This commit is contained in:
Will Kahn-Greene 2013-01-10 11:13:05 -05:00
Родитель 9bed7b5ce2
Коммит 114f0c99db
9 изменённых файлов: 365 добавлений и 143 удалений

159
CHANGELOG
Просмотреть файл

@ -6,6 +6,165 @@ What's new in ElasticUtils
:local:
Version 0.7: in development
===========================
.. Note::
This is a *big* change. We switched from pyes to pyelasticsearch. In
doing that, we changed a handful of signatures, nixed some
functionality that didn't make any sense any more, and cleaned a
bunch of things up.
If this terrifies you, read through these notes carefully and/or
stay with v0.6.
**API-breaking changes:**
* **pyelasticsearch v0.3 or later now required.**
ElasticUtils now requires pyelasticsearch v0.3 or later and its
requirements.
* **elasticutils.PYES_VERSION is removed.**
Since we're not using pyes, we removed `elasticutils.PYES_VERSION`.
* **ElasticUtils no longer supports thrift.**
Pretty sure we did a lousy job of supporting it before---it was all
in the pyes code and we had no tests for it.
* **get_es() signatures have changed.**
* takes urls now instead of hosts
* dump_curl argument is now gone
* default_indexes argument is gone
* new max_retries argument added
The arguments correspond with pyelasticsearch `ElasticSearch`
object.
ElasticUtils uses HTTP urls for connecting to ElasticSearch now.
Previously, you'd do::
get_es(hosts=['localhost:9200']) # Old way
Now you do::
get_es(urls=['http://localhost:9200']) # New way
The dump_curl argument was helpful for debugging, but we don't
really need it anymore. See the :ref:`debugging-chapter` for better
debugging methods.
Will now raise a `DeprecationWarning` if you pass in `hosts`
argument.
* **S searches all indexes and doctypes by default.**
Previously, if you did::
S()
it'd search an index named "default" for doctypes "document". That
was dumb. Now it searches all indexes and all doctypes by default.
* **S.es_builder is gone.**
``es_builder()`` was there to get around problems with pyes' ES
class. The pyelasticsearch `ElasticSearch` class is more
straight-forward, so we don't need to do circus shenanigans.
You can probably do what you need to with either the ``es()``
transform or by subclassing `S` and overriding the ``get_es()``
method.
* **MLT arguments changed.**
The `fields` argument in the constructor was renamed to `mlt_fields`
to be in line with ElasticSearch API names.
Will now raise a `DeprecationWarning` if you pass in `fields`
argument.
* **Django: changed settings.**
Changed ES_HOSTS setting to ES_URLS. This is both a name and a value
change. ES_URLS takes a list of strings each is an http url. You'll
neex to update your settings files from::
ES_HOSTS = ['localhost:9200'] # Old way
to::
ES_URLS = ['http://localhost:9200'] # New way
ES_DUMP_CURL is gone.
* **Django: removed the statsd code.**
* **Django: ESTestCase was renamed to ElasticTestCase.**
* **Django: Indexable.index() method no longer has bulk argument.**
The Indexable.index() method no longer does bulk indexing. The
way pyes did this was kind of squirrely and caused issues if you
didn't have the order of operations correct.
Now Indexable.index() only indexes a single document.
TODO: Need a way to do bulk indexing.
* **pyes -> pyelasticsearch changes.**
If you called ``.get_es()`` and got a pyes `ES` object and did
things with that (create index, create mappings, delete indexes,
indexing, cluster health, ...), you're going to need to make
some changes.
You can either:
1. rewrite that code to use pyelasticsearch `ElasticSearch`
equivalents, or
2. write and use your own ``get_es()`` function that returns
a pyes `ES` object
Rewriting shouldn't be too hard. The pyelasticsearch documentation
is pretty good and for most things, there's a 1-to-1 translation.
**Changes:**
* **pyes is no longer a requirement.**
We no longer use pyes so you can remove it from your requirements.
* **Django: es_required_or_50x handles different exceptions.**
Previously it handled:
* pyes.urllib3.MaxRetryError
* pyes.exceptions.IndexMissingException
* pyes.exceptions.ElasticSearchException
We're not using pyes anymore, so now it handles:
* pyelasticsearch.exceptions.ConnectionError
* pyelasticsearch.exceptions.ElasticHttpNotFoundError
* pyelasticsearch.exceptions.Timeout
You probably don't need to do anything about this, but it's good to
know.
* **elasticutils.PYELASTICSEARCH_VERSION is added.**
You can see which version of pyelasticsearch is in use by doing::
from elasticutils import PYELASTICSEARCH_VERSION
print PYELASTICSEARCH_VERSION
Version 0.6: Released January 17th, 2013
========================================

Просмотреть файл

@ -1,3 +1,5 @@
.. _debugging-chapter:
=========
Debugging
=========
@ -12,21 +14,72 @@ Want to see how a score for a search result was calculated? See
:ref:`scores-and-explanations`.
get_es dump_curl
Logging
=======
pyelasticsearch logs to the ``pyelasticsearch`` logger using the
Python logging module. If you configure that to show DEBUG-level
messages, then it'll show the requests in curl form, responses, and
when it marks servers as dead.
Additionally, pyelasticsearch uses Requests which logs to the
``requests`` logger using the Python logging module. If you configure
that to show INFO-level messages, then you'll see all that stuff.
::
import logging
logging.getLogger('pyelasticsearch').setLevel(logging.DEBUG)
logging.getLogger('requests').setLevel(logging.DEBUG)
.. Note::
This assumes that logging is already set up with something like
this::
import logging
logging.basicConfig()
pyelasticsearch will log lines like::
DEBUG:pyelasticsearch:Making a request equivalent to this: curl
-XGET 'http://localhost:9200/fooindex/testdoc/_search' -d '{"fa
cets": {"topics": {"terms": {"field": "topics"}}}}'
You can copy and paste the curl line and it'll work on the command
line.
.. Note::
If you add a ``pretty=1`` to the query string of the url that
you're curling, then ElasticSearch will return a prettified
response that's easier to read.
Seeing the query
================
You can pass a function into `get_es()` which will let you dump the
curl equivalents.
The `S` class has a `_build_query()` method that you can use to see the
body of the ElasticSearch request it's generated with the parameters
you've specified so far. This is helpful in debugging ElasticUtils and
figuring out whether it's doing things poorly.
For example::
from elasticutils import get_es
some_s = S()
print some_s._build_query()
class CurlDumper(object):
def write(self, s):
print s
es = get_es(dump_curl=CurlDumper())
.. Note::
This is a "private" method, so we might change it at some point.
Having said that, it hasn't changed so far and is probably useful
for debugging.
elasticsearch-head
@ -43,22 +96,5 @@ elasticsearch-paramedic
https://github.com/karmi/elasticsearch-paramedic
elasticsearch-paramedic allows you to see the state and real-time statistics
of your ES cluster.
ngrep
=====
http://ngrep.sourceforge.net/
Sometimes, it helps to see exactly what's going over the wire. ngrep has a
horrible web-site, but it's a super handy tool for seeing the complete
conversation. You can use it like this::
$ ngrep -d any -p 9200
And then run your program and watch the output.
I often use this when testing sample ElasticUtils programs to see how
mappings, document values, facets, filters, queries and all that work.
elasticsearch-paramedic allows you to see the state and real-time
statistics of your ElasticSearch cluster.

Просмотреть файл

@ -28,21 +28,12 @@ file:
`es_required` will return and log a warning. This is useful while
developing, so you don't have to have ElasticSearch running.
.. data:: ES_DUMP_CURL
.. data:: ES_URLS
If set to a file path all the requests that `ElasticUtils` makes
will be dumped into the designated file.
This is a list of ElasticSearch urls. In development this will look
like::
If set to a class instance, calls the ``.write()`` method with
the curl equivalents.
See :ref:`django-debugging` for more details.
.. data:: ES_HOSTS
This is a list of ES hosts. In development this will look like::
ES_HOSTS = ['127.0.0.1:9200']
ES_URLS = ['http://localhost:9200']
.. data:: ES_INDEXES
@ -84,23 +75,15 @@ file:
.. data:: ES_TIMEOUT
Defines the timeout for the `ES` connection. This defaults to 5
seconds.
Defines the timeout for the `ElasticSearch` connection. This
defaults to 5 seconds.
ES
==
ElasticSearch
=============
The `get_es()` in the Django contrib will helpfully cache your ES
objects thread-local.
It is built with the settings from your `django.conf.settings`.
.. Note::
`get_es()` only caches the `ES` if you don't pass in any override
arguments. If you pass in override arguments, it doesn't cache it
and instead creates a new one.
The `get_es()` in the Django contrib will use Django settings listed
above to build the ElasticSearch object.
Using with Django ORM models
@ -198,8 +181,8 @@ explicitly specifying `.get_mapping()`.
def get_mapping(cls):
"""Returns an ElasticSearch mapping."""
return {
# The id is an integer, so store it as such. ES would have
# inferred this just fine.
# The id is an integer, so store it as such. ElasticSearch
# would have inferred this just fine.
'id': {'type': 'integer'},
# The name is a name---so we shouldn't analyze it
@ -293,60 +276,25 @@ Writing tests
:Requirements: Django, test_utils, nose
In `elasticutils.contrib.django.estestcase`, is `ESTestCase` which can
be subclassed in your app's test cases.
In `elasticutils.contrib.django.estestcase`, is
`ElasticSearchTestCase` which can be subclassed in your app's test
cases.
It does the following:
* If `ES_HOSTS` is empty it raises a `SkipTest`.
* `self.es` is available from the `ESTestCase` class and any subclasses.
* If `ES_URLS` is empty it raises a `SkipTest`.
* `self.es` is available from the `ElasticSearchTestCase` class and
any subclasses.
* At the end of the test case the index is wiped.
Example::
from elasticutils.djangolib import ESTestCase
from elasticutils.djangolib import ElasticSearchTestCase
class TestQueries(ESTestCase):
class TestQueries(ElasticSearchTestCase):
def test_query(self):
...
def test_locked_filters(self):
...
.. _django-debugging:
Debugging
=========
You can set the ``settings.ES_DUMP_CURL`` to a few different things
all of which can be helpful in debugging ElasticUtils.
1. a file path
This will cause PyES to write the curl equivalents of the commands
it's sending to ElasticSearch to a file.
Example setting::
ES_DUMP_CURL = '/var/log/es_curl.log'
.. Note::
The file is not closed until the process ends. Because of that,
you don't see much in the file until it's done.
2. a class instance that has a ``.write()`` method
PyES will call the ``.write()`` method with the curl equivalent and
then you can do whatever you want with it.
For example, this writes curl equivalent output to stdout::
class CurlDumper(object):
def write(self, s):
print s
ES_DUMP_CURL = CurlDumper()

Просмотреть файл

@ -1,18 +1,21 @@
=============
Getting an ES
=============
.. _es-chapter:
ElasticUtils uses `pyes` which comes with a handy `ES` object. This
lets you work with ElasticSearch outside of what ElasticUtils can do.
===============================
Getting an ElasticSearch object
===============================
ElasticUtils uses `pyelasticsearch` which comes with a handy
`ElasticSearch` object. This lets you work with ElasticSearch outside
of what ElasticUtils can do.
To access this, you use `get_es()` which builds an `ElasticSearch`
object.
To access this, you use `get_es()` which builds an `ES`.
.. autofunction:: elasticutils.get_es
.. Warning::
.. seealso::
ElasticUtils works with ``pyes`` 0.15 and 0.16. The API for later
versions of pyes has changed too much and won't work with
ElasticUtils. We're planning to switch to something different in
the future.
http://pyelasticsearch.readthedocs.org/en/latest/api/
pyelasticsearch ElasticSearch documentation.

Просмотреть файл

@ -4,9 +4,9 @@
ElasticUtils
============
ElasticUtils is a Python library that gives you a Django queryset-like
API for `elasticsearch <http://elasticsearch.org/>`_ as well as some
other tools for making it easier to integrate elasticsearch into your
ElasticUtils is a Python library that gives you a chainable search API
for `ElasticSearch <http://elasticsearch.org/>`_ as well as some other
tools to make it easier to integrate ElasticSearch into your
application.
:Version: |release|

Просмотреть файл

@ -30,6 +30,5 @@ From git
Do::
$ git clone git://github.com/mozilla/elasticutils.git
For other ways to clone, see
`<https://github.com/mozilla/elasticutils>`_.
$ cd elasticutils
$ python setup.py install

Просмотреть файл

@ -18,12 +18,13 @@ For example::
This creates an `MLT` that will return documents that are like
document 2034 of type `addon` in the `addon_index`.
document with id 2034 of type `addon` in the `addon_index`.
You can specify an `S` and the `MLT` will derive the index, doctype,
ES object, and also use the search specified by the S in the body of
the More Like This request. This allows you to get documents like the
one specified that also meet query and filter criteria. For example::
You can pass it an `S` instance and the `MLT` will derive the index,
doctype, ElasticSearch object, and also use the search specified by
the `S` in the body of the More Like This request. This allows you to
get documents like the one specified that also meet query and filter
criteria. For example::
s = S().filter(product='firefox')
mlt = MLT(2034, s=s)
@ -46,6 +47,8 @@ moreLikeThis query
ElasticSearch guide on the moreLikeThis query which specifies the
additional parameters you can use.
http://pyelasticsearch.readthedocs.org/en/latest/api/#pyelasticsearch.ElasticSearch.more_like_this
pyelasticsearch documentation for MLT
API
===

Просмотреть файл

@ -14,15 +14,16 @@ ElasticSearch simple.
For example::
q = (S().filter(product='firefox')
q = (S().es(urls=['http://localhost:9200'])
.indexes('addon_index')
.doctypes('addon')
.filter(product='firefox')
.filter(version='4.0', platform='all')
.query(title='Example')
.facet(products={'field': 'product', 'global': True})
.facet(versions={'field': 'version'})
.facet(platforms={'field': 'platform'})
.facet(types={'field': 'type'})
.doctypes('addon')
.indexes('addon_index')
.query(title='Example'))
.facet(types={'field': 'type'}))
The ElasticSearch REST API curl would look like this::
@ -88,16 +89,31 @@ the `ElasticSearch JSON`.
All about S
===========
Basic untyped S
---------------
What is S?
----------
`S` is the class that you instantiate to create a search. For example::
`S` is the class that you instantiate to define an ElasticSearch
search. For example::
searcher = S()
This creates an `S` with using the defaults:
`S` has a bunch of methods that all return a new `S` with additional
accumulated search criteria.
* uses an `ElasticSearch` object configured to connect to
``http://localhost:9200`` -- call ``.es()`` to specify
connection parameters
* searches across all indexes -- call ``.indexes()`` to specify
indexes
* searches across all doctypes -- call ``.doctypes()`` to specify
doctypes
Chainable
---------
`S` has methods that return a new `S` instance with the additional
specified criteria. In this way `S` is chainable and you can reuse `S`
objects for your searches.
For example::
@ -114,19 +130,20 @@ all.
`s2` has a query.
`s3` has everything in `s2` plus a ``awesome=True`` filter.
`s3` has everything in `s2` with a ``awesome=True`` filter.
`s4` has everything in `s2` with a ``awesome=False`` filter.
When you create an `S` with no type, it's called an "untyped S". If
you don't specify ``.values_dict`` or ``.values_list``, then your
Untyped S and Typed S
---------------------
When you create an `S` with no type, it's called an "untyped S".
If you don't specify ``.values_dict`` or ``.values_list``, then your
search results are in the form of a sequence of `DefaultMappingType`
instances. More about this in :ref:`queries-mapping-type`.
Typed S
-------
You can also construct a `typed S` which is an `S` with a
`MappingType` subclass. For example::
@ -142,17 +159,74 @@ You can also construct a `typed S` which is an `S` with a
return 'mymappingtype'
results = S(MyMappingType).query(title__text='plugins')
results = (S(MyMappingType).es(urls=['http://localhost:9200'])
.query(title__text='plugins'))
``results`` will be an iterable of `MyMappingType` instances---one for
each search result.
Slicing
-------
`S` supports slicing allowing you to get back only the results you're
looking for.
For example::
some_s = S()
results = some_s[:10] # returns first 10 results
results = some_s[10:20] # returns results 10 through 19
The slicing is chainable, too::
some_s = S()[:10]
first_ten_pitchers = some_s.filter(position='pitcher')
first_ten_catchers = some_s.filter(position='catcher')
.. Note::
The slice happens on the ElasticSearch side---it doesn't pull all
the results back and then slice them in Python. Ew.
ElasticSearch connection, index and doctypes
============================================
`S` will generate an `ElasticSearch` object that connects to
``http://localhost:9200`` by default. That's usually not what you
want. You can use the ``es()`` method to specify the arguments used to
create the ElasticSearch object.
For example::
ES_URLS = ['http://localhost:9200']
q = S().es(urls=ES_URLS)
q = S().es(urls=ES_URLS, timeout=10)
See :ref:`es-chapter` for the list of arguments you can pass in.
An `untyped S` will search all indexes and all doctypes by default. If
that's not what you want, then you should use the ``indexes()`` and
``doctypes()`` methods.
For example::
q = S().indexes('someindex').doctypes('sometype')
If you're using a `typed S`, then you can specify the indexes and
doctypes in the `MappingType` subclass.
Match All
=========
By default ``S()`` with no filters or queries specified will do a
By default, ``S()`` with no filters or queries specified will do a
``match_all`` query in ElasticSearch.
.. seealso::

Просмотреть файл

@ -1,7 +1,7 @@
"""
This is a sample program that uses PyES ES to create an index, create
a mapping, and index some data. Then it uses ElasticUtils S to show
some behavior with facets.
This is a sample program that uses pyelasticsearch ElasticSearch
object to create an index, create a mapping, and index some data. Then
it uses ElasticUtils S to show some behavior with facets.
"""
from elasticutils import get_es, S