зеркало из https://github.com/mozilla/kitsune.git
Remove ES2 code-base.
This commit is contained in:
Родитель
422aba74f6
Коммит
3ebcbd5382
2
Makefile
2
Makefile
|
@ -54,7 +54,7 @@ build-full: .docker-build-pull
|
|||
touch .docker-build-full
|
||||
|
||||
pull: .env
|
||||
-GIT_COMMIT_SHORT= ${DC} pull base base-dev staticfiles locales full-no-locales full mariadb elasticsearch redis
|
||||
-GIT_COMMIT_SHORT= ${DC} pull base base-dev staticfiles locales full-no-locales full mariadb redis
|
||||
touch .docker-build-pull
|
||||
|
||||
rebuild: clean build
|
||||
|
|
|
@ -11,7 +11,6 @@ services:
|
|||
tty: true
|
||||
depends_on:
|
||||
- mariadb
|
||||
- elasticsearch
|
||||
- elasticsearch7
|
||||
- kibana
|
||||
- redis
|
||||
|
@ -30,7 +29,6 @@ services:
|
|||
user: ${UID:-kitsune}
|
||||
depends_on:
|
||||
- mariadb
|
||||
- elasticsearch
|
||||
- elasticsearch7
|
||||
- redis
|
||||
|
||||
|
@ -48,7 +46,6 @@ services:
|
|||
env_file: .env-test
|
||||
depends_on:
|
||||
- mariadb
|
||||
- elasticsearch
|
||||
- elasticsearch7
|
||||
- redis
|
||||
|
||||
|
@ -140,12 +137,6 @@ services:
|
|||
volumes:
|
||||
- mysqlvolume:/var/lib/mysql
|
||||
|
||||
elasticsearch:
|
||||
image: elasticsearch:2.4
|
||||
ports:
|
||||
- "9201:9200"
|
||||
- "9301:9300"
|
||||
|
||||
elasticsearch7:
|
||||
image: docker.elastic.co/elasticsearch/elasticsearch:7.10.2
|
||||
environment:
|
||||
|
|
|
@ -27,8 +27,7 @@ Part 2: Developer's Guide
|
|||
wsgi
|
||||
email
|
||||
localization
|
||||
searchchapter
|
||||
search-v2
|
||||
search
|
||||
frontend
|
||||
notes
|
||||
|
||||
|
|
|
@ -1,354 +0,0 @@
|
|||
.. _search-chapter:
|
||||
|
||||
======
|
||||
Search
|
||||
======
|
||||
|
||||
.. warning::
|
||||
This section of documentation may be outdated.
|
||||
See :any:`search-v2` for up-to-date (but partial) documentation.
|
||||
|
||||
Kitsune uses `Elasticsearch <https://www.elastic.co/>`_ to
|
||||
power its on-site search facility.
|
||||
|
||||
It gives us a number of advantages over MySQL's full-text search or
|
||||
Google's site search.
|
||||
|
||||
* Much faster than MySQL.
|
||||
|
||||
* And reduces load on MySQL.
|
||||
|
||||
* We have total control over what results look like.
|
||||
* We can adjust searches with non-visible content.
|
||||
* We don't rely on Google reindexing the site.
|
||||
* We can fine-tune the algorithm and scoring.
|
||||
|
||||
|
||||
Installing Elasticsearch
|
||||
========================
|
||||
|
||||
There's an installation guide on the Elasticsearch site:
|
||||
|
||||
https://www.elastic.co/guide/en/elasticsearch/reference/1.3/setup-service.html
|
||||
|
||||
We're currently using `1.2.4 <https://www.elastic.co/downloads/past-releases/elasticsearch-1-2-4>`_
|
||||
in production.
|
||||
|
||||
The directory you install Elasticsearch in will hereafter be referred
|
||||
to as ``ELASTICDIR``.
|
||||
|
||||
You can configure Elasticsearch with the configuration file at
|
||||
``ELASTICDIR/config/elasticsearch.yml``.
|
||||
|
||||
Elasticsearch uses several settings in ``kitsune/settings.py`` that you
|
||||
need to override in ``kitsune/settings_local.py``. Here's an example::
|
||||
|
||||
# Connection information for Elastic
|
||||
ES_URLS = ['http://127.0.0.1:9200']
|
||||
ES_INDEXES = {'default': 'sumo_dev'}
|
||||
ES_WRITE_INDEXES = ES_INDEXES
|
||||
|
||||
|
||||
These settings explained:
|
||||
|
||||
``ES_URLS``
|
||||
|
||||
Defaults to ``['http://127.0.0.1:9200']``.
|
||||
|
||||
Points to the url for your Elasticsearch instance.
|
||||
|
||||
.. Warning::
|
||||
|
||||
The url must match the host and port in
|
||||
``ELASTICDIR/config/elasticsearch.yml``. So if you change it in
|
||||
one place, you must also change it in the other.
|
||||
|
||||
|
||||
``ES_INDEXES``
|
||||
|
||||
Mapping of ``'default'`` to the name of the index used for
|
||||
searching.
|
||||
|
||||
The index name must be prefixed with the value of
|
||||
``ES_INDEX_PREFIX``.
|
||||
|
||||
Examples if ``ES_INDEX_PREFIX`` is set to ``'sumo'``::
|
||||
|
||||
ES_INDEXES = {'default': 'sumo'}
|
||||
ES_INDEXES = {'default': 'sumo_20120213'}
|
||||
|
||||
ES_INDEXES = {'default': 'tofurkey'} # WRONG!
|
||||
|
||||
|
||||
``ES_WRITE_INDEXES``
|
||||
|
||||
Mapping of ``'default'`` to the name of the index used for
|
||||
indexing.
|
||||
|
||||
The index name must be prefixed with the value of
|
||||
``ES_INDEX_PREFIX``.
|
||||
|
||||
Examples if ``ES_INDEX_PREFIX`` is set to ``'sumo'``::
|
||||
|
||||
ES_WRITE_INDEXES = ES_INDEXES
|
||||
ES_WRITE_INDEXES = {'default': 'sumo'}
|
||||
ES_WRITE_INDEXES = {'default': 'sumo_20120213'}
|
||||
|
||||
ES_WRITE_INDEXES = {'default': 'tofurkey'} # WRONG!
|
||||
|
||||
.. Note::
|
||||
|
||||
The separate roles for indexes allows us to push mapping
|
||||
changes to production. In the first push, we'll push the
|
||||
mapping change and give ``ES_WRITE_INDEXES`` a different
|
||||
value. Then we reindex into the new index. Then we push a
|
||||
change updating ``ES_INDEXES`` to equal ``ES_WRITE_INDEXES``
|
||||
allowing the search code to use the new index.
|
||||
|
||||
If you're a developer, the best thing to do is have your
|
||||
``ES_WRITE_INDEXES`` be the same as ``ES_INDEXES``. That way
|
||||
you can reindex and search and you don't have to fiddle with
|
||||
settings in between.
|
||||
|
||||
|
||||
There are a few other settings you can set in your
|
||||
``kitsune/settings_local.py`` file that override ElasticUtils defaults. See
|
||||
`the ElasticUtils docs
|
||||
<https://elasticutils.readthedocs.io/en/latest/django.html#configuration>`_
|
||||
for details.
|
||||
|
||||
Other things you can change:
|
||||
|
||||
``ES_INDEX_PREFIX``
|
||||
|
||||
Defaults to ``'sumo'``.
|
||||
|
||||
All indexes for this application must start with the index
|
||||
prefix. Indexes that don't start with the index prefix won't show
|
||||
up in index listings and cannot be deleted through the esdelete
|
||||
subcommand and the search admin.
|
||||
|
||||
.. Note::
|
||||
|
||||
The index names in both ``ES_INDEXES`` and ``ES_WRITE_INDEXES``
|
||||
**must** start with this prefix.
|
||||
|
||||
``ES_LIVE_INDEXING``
|
||||
|
||||
Defaults to False.
|
||||
|
||||
You can also set ``ES_LIVE_INDEXING`` in your
|
||||
``kitsune/settings_local.py`` file. This affects whether Kitsune does
|
||||
Elasticsearch indexing when data changes in the ``post_save`` and
|
||||
``pre_delete`` hooks.
|
||||
|
||||
For tests, ``ES_LIVE_INDEXING`` is set to ``False`` except for
|
||||
Elasticsearch specific tests so we're not spending a ton of time
|
||||
indexing things we're not using.
|
||||
|
||||
``ES_TIMEOUT``
|
||||
|
||||
Defaults to 5.
|
||||
|
||||
This affects timeouts for search-related requests.
|
||||
|
||||
If you're having problems with ES being slow, raising this number
|
||||
might be helpful.
|
||||
|
||||
|
||||
Using Elasticsearch
|
||||
===================
|
||||
|
||||
Running
|
||||
-------
|
||||
|
||||
Start Elasticsearch by::
|
||||
|
||||
$ ELASTICDIR/bin/elasticsearch
|
||||
|
||||
That launches Elasticsearch in the background.
|
||||
|
||||
|
||||
Indexing
|
||||
--------
|
||||
|
||||
Do a complete reindexing of everything by::
|
||||
|
||||
$ ./manage.py esreindex
|
||||
|
||||
This will delete the existing index specified by ``ES_WRITE_INDEXES``,
|
||||
create a new one, and reindex everything in your database. On my
|
||||
machine it takes under an hour.
|
||||
|
||||
If you need to get stuff done and don't want to wait for a full
|
||||
indexing, you can index a percentage of things.
|
||||
|
||||
For example, this indexes 10% of your data ordered by id::
|
||||
|
||||
$ ./manage.py esreindex --percent 10
|
||||
|
||||
This indexes 50% of your data ordered by id::
|
||||
|
||||
$ ./manage.py esreindex --percent 50
|
||||
|
||||
I use this when I'm fiddling with mappings and the indexing code.
|
||||
|
||||
Another way of specifying a smaller number of things to index is by
|
||||
indicating how recently updated things should be to be included::
|
||||
|
||||
$ ./manage.py esreindex --hours-ago 2
|
||||
$ ./manage.py esreindex --minutes-ago 20
|
||||
$ ./manage.py esreindex --seconds-ago 90
|
||||
|
||||
Those options can be combined as well if you wish. Different indexes have
|
||||
different ways of determining how long ago something was updated, but as
|
||||
a whole this should reindex everything in every index (or those specified
|
||||
in the --mapping_types option) that was updated less than or equal to how
|
||||
long ago you say.
|
||||
|
||||
You can also specify which mapping_types to index::
|
||||
|
||||
$ ./manage.py esreindex --mapping_types questions_question,wiki_document
|
||||
|
||||
See ``--help`` for more details::
|
||||
|
||||
$ ./manage.py esreindex --help
|
||||
|
||||
|
||||
.. Note::
|
||||
|
||||
Once you've indexed everything, if you have ``ES_LIVE_INDEXING``
|
||||
set to ``True``, you won't have to do it again unless indexing code
|
||||
changes. The models have ``post_save`` and ``pre_delete`` hooks
|
||||
that will update the index as the data changes.
|
||||
|
||||
|
||||
.. Note::
|
||||
|
||||
If you kick off indexing with the admin, then indexing gets done in
|
||||
chunks by celery tasks. If you need to halt indexing, you can purge
|
||||
the tasks with::
|
||||
|
||||
$ celery -A kitsune purge
|
||||
|
||||
If you do this often, it helps to write a shell script for it.
|
||||
|
||||
|
||||
Health/statistics
|
||||
-----------------
|
||||
|
||||
You can see Elasticsearch index status with::
|
||||
|
||||
$ ./manage.py esstatus
|
||||
|
||||
This lists the indexes, tells you which ones are set to read and
|
||||
write, and tells you how many documents are in the indexes by mapping
|
||||
type.
|
||||
|
||||
|
||||
Deleting indexes
|
||||
----------------
|
||||
|
||||
You can use the search admin to delete the index.
|
||||
|
||||
On the command line, you can do::
|
||||
|
||||
$ ./manage.py esdelete <index-name>
|
||||
|
||||
|
||||
Implementation details
|
||||
----------------------
|
||||
|
||||
Kitsune uses `elasticutils <https://github.com/mozilla/elasticutils>`_
|
||||
and `pyelasticsearch
|
||||
<https://pyelasticsearch.readthedocs.io/en/latest/>`_.
|
||||
|
||||
Most of our code is in the ``search`` app in ``kitsune/search/``.
|
||||
|
||||
Models in Kitsune that are indexable use ``SearchMixin`` defined in
|
||||
``models.py``.
|
||||
|
||||
Utility functions are implemented in ``es_utils.py``.
|
||||
|
||||
Sub commands for ``manage.py`` are implemented in
|
||||
``management/commands/``.
|
||||
|
||||
|
||||
Searching on the site
|
||||
=====================
|
||||
|
||||
Scoring
|
||||
-------
|
||||
|
||||
These are the default weights that apply to all searches:
|
||||
|
||||
wiki (aka kb)::
|
||||
|
||||
document_title__match 6
|
||||
document_content__match 1
|
||||
document_keywords__match 8
|
||||
document_summary__match 2
|
||||
|
||||
questions (aka support forums)::
|
||||
|
||||
question_title__match 4
|
||||
question_content__match 3
|
||||
question_answer_content__match 3
|
||||
|
||||
forums (aka contributor forums)::
|
||||
|
||||
post_title__match 2
|
||||
post_content__match 1
|
||||
|
||||
|
||||
Elasticsearch is built on top of Lucene so the `Lucene documentation
|
||||
on scoring
|
||||
<http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html>`_
|
||||
covers how a document is scored in regards to the search query and its
|
||||
contents. The weights modify that---they're query-level boosts.
|
||||
|
||||
Additionally, `this blog post from 2006 <http://www.supermind.org/blog/378>`_
|
||||
is really helpful in terms of providing insight on the implications of
|
||||
the way things are scored.
|
||||
|
||||
|
||||
Filters
|
||||
-------
|
||||
|
||||
We use a series of filters on document_tag, question_tag, and other
|
||||
properties of documents like `has_helpful`, `is_locked`, `is_archived`,
|
||||
etc.
|
||||
|
||||
In ElasticSearch, filters remove items from the result set, but don't
|
||||
affect the scoring.
|
||||
|
||||
We cannot apply weights to filtered fields.
|
||||
|
||||
|
||||
Regular search
|
||||
--------------
|
||||
|
||||
You could start a `regular` search from the front page or from the
|
||||
search form on any article page.
|
||||
|
||||
Regular search does the following:
|
||||
|
||||
1. searches only kb and support forums
|
||||
2. (filter) kb articles are tagged with the product (e.g. "desktop")
|
||||
3. (filter) kb articles must not be archived
|
||||
4. (filter) kb articles must be in Troubleshooting (10) and
|
||||
How-to (20) categories
|
||||
5. (filter) support forum posts tagged with the product
|
||||
(e.g. "desktop")
|
||||
6. (filter) support forum posts must have an answer marked as helpful
|
||||
7. (filter) support forum posts must not be archived
|
||||
|
||||
It scores as specified above.
|
||||
|
||||
|
||||
Ask A Question search
|
||||
---------------------
|
||||
|
||||
An `Ask a question` or `AAQ` search is any search that is performed within
|
||||
the AAQ workflow. The only difference to `regular` search is that `AAQ`
|
||||
search shows forum posts that have no answer marked as helpful.
|
|
@ -1,21 +1,21 @@
|
|||
from nose.tools import eq_
|
||||
|
||||
from pyquery import PyQuery as pq
|
||||
|
||||
from kitsune.forums.tests import ThreadFactory
|
||||
from kitsune.questions.tests import AnswerFactory
|
||||
from kitsune.search.tests.test_es import ElasticTestCase
|
||||
from kitsune.search.v2.tests import Elastic7TestCase
|
||||
from kitsune.sumo.templatetags.jinja_helpers import urlparams
|
||||
from kitsune.sumo.tests import LocalizingClient
|
||||
from kitsune.sumo.urlresolvers import reverse
|
||||
from kitsune.users.tests import UserFactory
|
||||
from kitsune.wiki.tests import DocumentFactory, RevisionFactory, ApprovedRevisionFactory
|
||||
from kitsune.wiki.tests import ApprovedRevisionFactory, DocumentFactory, RevisionFactory
|
||||
|
||||
|
||||
class UserSearchTests(ElasticTestCase):
|
||||
class UserSearchTests(Elastic7TestCase):
|
||||
"""Tests for the Community Hub user search page."""
|
||||
|
||||
client_class = LocalizingClient
|
||||
search_tests = True
|
||||
|
||||
def test_no_results(self):
|
||||
UserFactory(username="foo", profile__name="Foo Bar")
|
||||
|
@ -45,10 +45,11 @@ class UserSearchTests(ElasticTestCase):
|
|||
eq_(len(doc(".results-user")), 2)
|
||||
|
||||
|
||||
class LandingTests(ElasticTestCase):
|
||||
class LandingTests(Elastic7TestCase):
|
||||
"""Tests for the Community Hub landing page."""
|
||||
|
||||
client_class = LocalizingClient
|
||||
search_tests = True
|
||||
|
||||
def test_top_contributors(self):
|
||||
"""Verify the top contributors appear."""
|
||||
|
@ -104,9 +105,11 @@ class LandingTests(ElasticTestCase):
|
|||
assert "we are SUMO!" in doc("#recent-threads td").html()
|
||||
|
||||
|
||||
class TopContributorsTests(ElasticTestCase):
|
||||
class TopContributorsTests(Elastic7TestCase):
|
||||
"""Tests for the Community Hub top contributors page."""
|
||||
|
||||
search_tests = True
|
||||
|
||||
client_class = LocalizingClient
|
||||
|
||||
def test_invalid_area(self):
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
from datetime import datetime, date, timedelta
|
||||
from datetime import date, datetime, timedelta
|
||||
|
||||
from nose.tools import eq_
|
||||
|
||||
|
@ -9,13 +9,14 @@ from kitsune.community.utils import (
|
|||
)
|
||||
from kitsune.products.tests import ProductFactory
|
||||
from kitsune.questions.tests import AnswerFactory
|
||||
from kitsune.search.tests.test_es import ElasticTestCase
|
||||
from kitsune.search.v2.tests import Elastic7TestCase
|
||||
from kitsune.sumo.tests import LocalizingClient
|
||||
from kitsune.wiki.tests import DocumentFactory, RevisionFactory
|
||||
|
||||
|
||||
class TopContributorTests(ElasticTestCase):
|
||||
class TopContributorTests(Elastic7TestCase):
|
||||
client_class = LocalizingClient
|
||||
search_tests = True
|
||||
|
||||
def test_top_contributors_kb(self):
|
||||
d = DocumentFactory(locale="en-US")
|
||||
|
@ -24,8 +25,6 @@ class TopContributorTests(ElasticTestCase):
|
|||
RevisionFactory(document=d)
|
||||
r4 = RevisionFactory(document=d, created=date.today() - timedelta(days=91))
|
||||
|
||||
self.refresh()
|
||||
|
||||
# By default, we should only get 2 top contributors back.
|
||||
top, _ = top_contributors_kb()
|
||||
eq_(2, len(top))
|
||||
|
|
|
@ -1,14 +1,15 @@
|
|||
from nose.tools import eq_
|
||||
|
||||
from kitsune.search.v2.tests import Elastic7TestCase
|
||||
from kitsune.sumo.tests import LocalizingClient
|
||||
from kitsune.sumo.urlresolvers import reverse
|
||||
from kitsune.search.tests import ElasticTestCase
|
||||
|
||||
|
||||
class ContributorsMetricsTests(ElasticTestCase):
|
||||
class ContributorsMetricsTests(Elastic7TestCase):
|
||||
"""Tests for the Community Hub user search page."""
|
||||
|
||||
client_class = LocalizingClient
|
||||
search_tests = True
|
||||
|
||||
def test_it_works(self):
|
||||
url = reverse("community.metrics")
|
||||
|
|
|
@ -4,9 +4,6 @@ from datetime import datetime
|
|||
from django.contrib.auth.models import User
|
||||
from django.db import models
|
||||
|
||||
from kitsune.search.models import (
|
||||
SearchMixin,
|
||||
)
|
||||
from kitsune.sumo.models import ModelBase
|
||||
|
||||
|
||||
|
@ -58,7 +55,7 @@ class Tweet(ModelBase):
|
|||
return tweet["text"]
|
||||
|
||||
|
||||
class Reply(ModelBase, SearchMixin):
|
||||
class Reply(ModelBase):
|
||||
"""A reply from an AoA contributor.
|
||||
|
||||
The Tweet table gets truncated regularly so we can't use it for metrics.
|
||||
|
|
|
@ -1,27 +1,19 @@
|
|||
import datetime
|
||||
import time
|
||||
|
||||
from django.contrib.auth.models import User
|
||||
from django.contrib.contenttypes.fields import GenericRelation
|
||||
from django.core.exceptions import ObjectDoesNotExist
|
||||
from django.db import models
|
||||
from django.db.models import Q
|
||||
from django.db.models.signals import pre_save
|
||||
from django.contrib.contenttypes.fields import GenericRelation
|
||||
from django.contrib.auth.models import User
|
||||
|
||||
from tidings.models import NotificationsMixin
|
||||
|
||||
from kitsune import forums
|
||||
from kitsune.access.utils import has_perm, perm_is_defined_on
|
||||
from kitsune.flagit.models import FlaggedObject
|
||||
from kitsune.sumo.models import ModelBase
|
||||
from kitsune.sumo.templatetags.jinja_helpers import urlparams, wiki_to_html
|
||||
from kitsune.sumo.urlresolvers import reverse
|
||||
from kitsune.sumo.models import ModelBase
|
||||
from kitsune.search.models import (
|
||||
SearchMappingType,
|
||||
SearchMixin,
|
||||
register_for_indexing,
|
||||
register_mapping_type,
|
||||
)
|
||||
|
||||
|
||||
def _last_post_from(posts, exclude_post=None):
|
||||
|
@ -108,7 +100,7 @@ class Forum(NotificationsMixin, ModelBase):
|
|||
return [f for f in Forum.objects.all() if f.allows_viewing_by(user)]
|
||||
|
||||
|
||||
class Thread(NotificationsMixin, ModelBase, SearchMixin):
|
||||
class Thread(NotificationsMixin, ModelBase):
|
||||
title = models.CharField(max_length=255)
|
||||
forum = models.ForeignKey("Forum", on_delete=models.CASCADE)
|
||||
created = models.DateTimeField(default=datetime.datetime.now, db_index=True)
|
||||
|
@ -190,104 +182,6 @@ class Thread(NotificationsMixin, ModelBase, SearchMixin):
|
|||
# If self.last_post is None, and this was called from Post.delete,
|
||||
# then Post.delete will erase the thread, as well.
|
||||
|
||||
@classmethod
|
||||
def get_mapping_type(cls):
|
||||
return ThreadMappingType
|
||||
|
||||
|
||||
@register_mapping_type
|
||||
class ThreadMappingType(SearchMappingType):
|
||||
seconds_ago_filter = "last_post__created__gte"
|
||||
|
||||
@classmethod
|
||||
def search(cls):
|
||||
return super(ThreadMappingType, cls).search().order_by("created")
|
||||
|
||||
@classmethod
|
||||
def get_model(cls):
|
||||
return Thread
|
||||
|
||||
@classmethod
|
||||
def get_query_fields(cls):
|
||||
return ["post_title", "post_content"]
|
||||
|
||||
@classmethod
|
||||
def get_mapping(cls):
|
||||
return {
|
||||
"properties": {
|
||||
"id": {"type": "long"},
|
||||
"model": {"type": "string", "index": "not_analyzed"},
|
||||
"url": {"type": "string", "index": "not_analyzed"},
|
||||
"indexed_on": {"type": "integer"},
|
||||
"created": {"type": "integer"},
|
||||
"updated": {"type": "integer"},
|
||||
"post_forum_id": {"type": "integer"},
|
||||
"post_title": {"type": "string", "analyzer": "snowball"},
|
||||
"post_is_sticky": {"type": "boolean"},
|
||||
"post_is_locked": {"type": "boolean"},
|
||||
"post_author_id": {"type": "integer"},
|
||||
"post_author_ord": {"type": "string", "index": "not_analyzed"},
|
||||
"post_content": {
|
||||
"type": "string",
|
||||
"analyzer": "snowball",
|
||||
"store": "yes",
|
||||
"term_vector": "with_positions_offsets",
|
||||
},
|
||||
"post_replies": {"type": "integer"},
|
||||
}
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def extract_document(cls, obj_id, obj=None):
|
||||
"""Extracts interesting thing from a Thread and its Posts"""
|
||||
if obj is None:
|
||||
model = cls.get_model()
|
||||
obj = model.objects.select_related("last_post").get(pk=obj_id)
|
||||
|
||||
d = {}
|
||||
d["id"] = obj.id
|
||||
d["model"] = cls.get_mapping_type_name()
|
||||
d["url"] = obj.get_absolute_url()
|
||||
d["indexed_on"] = int(time.time())
|
||||
|
||||
# TODO: Sphinx stores created and updated as seconds since the
|
||||
# epoch, so we convert them to that format here so that the
|
||||
# search view works correctly. When we ditch Sphinx, we should
|
||||
# see if it's faster to filter on ints or whether we should
|
||||
# switch them to dates.
|
||||
d["created"] = int(time.mktime(obj.created.timetuple()))
|
||||
|
||||
if obj.last_post is not None:
|
||||
d["updated"] = int(time.mktime(obj.last_post.created.timetuple()))
|
||||
else:
|
||||
d["updated"] = None
|
||||
|
||||
d["post_forum_id"] = obj.forum.id
|
||||
d["post_title"] = obj.title
|
||||
d["post_is_sticky"] = obj.is_sticky
|
||||
d["post_is_locked"] = obj.is_locked
|
||||
|
||||
d["post_replies"] = obj.replies
|
||||
|
||||
author_ids = set()
|
||||
author_ords = set()
|
||||
content = []
|
||||
|
||||
posts = Post.objects.filter(thread_id=obj.id).select_related("author")
|
||||
for post in posts:
|
||||
author_ids.add(post.author.id)
|
||||
author_ords.add(post.author.username)
|
||||
content.append(post.content)
|
||||
|
||||
d["post_author_id"] = list(author_ids)
|
||||
d["post_author_ord"] = list(author_ords)
|
||||
d["post_content"] = content
|
||||
|
||||
return d
|
||||
|
||||
|
||||
register_for_indexing("forums", Thread)
|
||||
|
||||
|
||||
class Post(ModelBase):
|
||||
thread = models.ForeignKey("Thread", on_delete=models.CASCADE)
|
||||
|
@ -368,9 +262,6 @@ class Post(ModelBase):
|
|||
return wiki_to_html(self.content)
|
||||
|
||||
|
||||
register_for_indexing("forums", Post, instance_to_indexee=lambda p: p.thread)
|
||||
|
||||
|
||||
def user_pre_save(sender, instance, **kw):
|
||||
"""When a user's username is changed, we must reindex the threads
|
||||
they participated in.
|
||||
|
|
|
@ -1,56 +0,0 @@
|
|||
from nose.tools import eq_
|
||||
|
||||
from kitsune.forums.models import ThreadMappingType
|
||||
from kitsune.forums.tests import ThreadFactory, PostFactory
|
||||
from kitsune.search.tests.test_es import ElasticTestCase
|
||||
from kitsune.users.tests import UserFactory
|
||||
|
||||
|
||||
class TestPostUpdate(ElasticTestCase):
|
||||
def test_added(self):
|
||||
# Nothing exists before the test starts
|
||||
eq_(ThreadMappingType.search().count(), 0)
|
||||
|
||||
# Creating a new Thread does create a new document in the index.
|
||||
new_thread = ThreadFactory()
|
||||
self.refresh()
|
||||
eq_(ThreadMappingType.search().count(), 1)
|
||||
|
||||
# Saving a new post in a thread doesn't create a new
|
||||
# document in the index. Therefore, the count remains 1.
|
||||
#
|
||||
# TODO: This is ambiguous: it's not clear whether we correctly
|
||||
# updated the document in the index or whether the post_save
|
||||
# hook didn't kick off. Need a better test.
|
||||
PostFactory(thread=new_thread)
|
||||
self.refresh()
|
||||
eq_(ThreadMappingType.search().count(), 1)
|
||||
|
||||
def test_deleted(self):
|
||||
# Nothing exists before the test starts
|
||||
eq_(ThreadMappingType.search().count(), 0)
|
||||
|
||||
# Creating a new Thread does create a new document in the index.
|
||||
new_thread = ThreadFactory()
|
||||
self.refresh()
|
||||
eq_(ThreadMappingType.search().count(), 1)
|
||||
|
||||
# Deleting the thread deletes the document in the index.
|
||||
new_thread.delete()
|
||||
self.refresh()
|
||||
eq_(ThreadMappingType.search().count(), 0)
|
||||
|
||||
def test_thread_is_reindexed_on_username_change(self):
|
||||
search = ThreadMappingType.search()
|
||||
|
||||
u = UserFactory(username="dexter")
|
||||
ThreadFactory(creator=u, title="Hello")
|
||||
|
||||
self.refresh()
|
||||
eq_(search.query(post_title="hello")[0]["post_author_ord"], ["dexter"])
|
||||
|
||||
# Change the username and verify the index.
|
||||
u.username = "walter"
|
||||
u.save()
|
||||
self.refresh()
|
||||
eq_(search.query(post_title="hello")[0]["post_author_ord"], ["walter"])
|
|
@ -2,11 +2,13 @@ from nose.tools import eq_
|
|||
from pyquery import PyQuery as pq
|
||||
|
||||
from kitsune.products.tests import ProductFactory
|
||||
from kitsune.search.tests.test_es import ElasticTestCase
|
||||
from kitsune.search.v2.tests import Elastic7TestCase
|
||||
from kitsune.sumo.urlresolvers import reverse
|
||||
|
||||
|
||||
class HomeTestCase(ElasticTestCase):
|
||||
class HomeTestCase(Elastic7TestCase):
|
||||
search_tests = True
|
||||
|
||||
def test_home(self):
|
||||
"""Verify that home page renders products."""
|
||||
|
||||
|
|
|
@ -1,22 +1,19 @@
|
|||
from django.conf import settings
|
||||
from django.core.cache import cache
|
||||
|
||||
from nose.tools import eq_
|
||||
from pyquery import PyQuery as pq
|
||||
|
||||
from kitsune.products.models import HOT_TOPIC_SLUG
|
||||
from kitsune.products.tests import ProductFactory, TopicFactory
|
||||
from kitsune.questions.models import QuestionLocale
|
||||
from kitsune.search.tests.test_es import ElasticTestCase
|
||||
from kitsune.search.v2.tests import Elastic7TestCase
|
||||
from kitsune.sumo.urlresolvers import reverse
|
||||
from kitsune.wiki.tests import (
|
||||
DocumentFactory,
|
||||
ApprovedRevisionFactory,
|
||||
HelpfulVoteFactory,
|
||||
)
|
||||
from kitsune.wiki.tests import ApprovedRevisionFactory, DocumentFactory, HelpfulVoteFactory
|
||||
|
||||
|
||||
class ProductViewsTestCase(ElasticTestCase):
|
||||
class ProductViewsTestCase(Elastic7TestCase):
|
||||
search_tests = True
|
||||
|
||||
def test_products(self):
|
||||
"""Verify that /products page renders products."""
|
||||
# Create some products.
|
||||
|
|
|
@ -1,15 +1,11 @@
|
|||
import logging
|
||||
import time
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
from django.conf import settings
|
||||
from django.core.management.base import BaseCommand
|
||||
from django.db import connection, transaction
|
||||
|
||||
from kitsune.questions.models import Question, QuestionMappingType, Answer
|
||||
from kitsune.search.es_utils import ES_EXCEPTIONS, get_documents
|
||||
from kitsune.search.tasks import index_task
|
||||
from kitsune.search.utils import to_class_path
|
||||
from kitsune.questions.models import Question, Answer
|
||||
|
||||
from kitsune.search.v2.es7_utils import index_objects_bulk
|
||||
|
||||
|
@ -56,38 +52,3 @@ class Command(BaseCommand):
|
|||
)
|
||||
index_objects_bulk.delay("QuestionDocument", q_ids)
|
||||
index_objects_bulk.delay("AnswerDocument", answer_ids)
|
||||
|
||||
# elastic v2 code:
|
||||
try:
|
||||
# So... the first time this runs, it'll handle 160K
|
||||
# questions or so which stresses everything. Thus we
|
||||
# do it in chunks because otherwise this won't work.
|
||||
#
|
||||
# After we've done this for the first time, we can nix
|
||||
# the chunking code.
|
||||
|
||||
from kitsune.search.utils import chunked
|
||||
|
||||
for chunk in chunked(q_ids, 100):
|
||||
|
||||
# Fetch all the documents we need to update.
|
||||
es_docs = get_documents(QuestionMappingType, chunk)
|
||||
|
||||
log.info("Updating %d index documents", len(es_docs))
|
||||
|
||||
documents = []
|
||||
|
||||
# For each document, update the data and stick it
|
||||
# back in the index.
|
||||
for doc in es_docs:
|
||||
doc["question_is_archived"] = True
|
||||
doc["indexed_on"] = int(time.time())
|
||||
documents.append(doc)
|
||||
|
||||
QuestionMappingType.bulk_index(documents)
|
||||
|
||||
except ES_EXCEPTIONS:
|
||||
# Something happened with ES, so let's push index
|
||||
# updating into an index_task which retries when it
|
||||
# fails because of ES issues.
|
||||
index_task.delay(to_class_path(QuestionMappingType), q_ids)
|
||||
|
|
|
@ -1,6 +1,5 @@
|
|||
import logging
|
||||
import re
|
||||
import time
|
||||
from datetime import date, datetime, timedelta
|
||||
from urllib.parse import urlparse
|
||||
|
||||
|
@ -21,22 +20,13 @@ from django.urls import resolve
|
|||
from django.utils.translation import pgettext, override as translation_override
|
||||
from elasticsearch7 import ElasticsearchException
|
||||
from product_details import product_details
|
||||
from taggit.models import Tag, TaggedItem
|
||||
from taggit.models import Tag
|
||||
|
||||
from kitsune.flagit.models import FlaggedObject
|
||||
from kitsune.products.models import Product, Topic
|
||||
from kitsune.questions import config
|
||||
from kitsune.questions.managers import AnswerManager, QuestionLocaleManager, QuestionManager
|
||||
from kitsune.questions.tasks import update_answer_pages, update_question_votes
|
||||
from kitsune.search.es_utils import UnindexMeBro
|
||||
from kitsune.search.models import (
|
||||
SearchMappingType,
|
||||
SearchMixin,
|
||||
register_for_indexing,
|
||||
register_mapping_type,
|
||||
)
|
||||
from kitsune.search.tasks import index_task
|
||||
from kitsune.search.utils import to_class_path
|
||||
from kitsune.sumo.models import LocaleField, ModelBase
|
||||
from kitsune.sumo.templatetags.jinja_helpers import urlparams, wiki_to_html
|
||||
from kitsune.sumo.urlresolvers import reverse, split_path
|
||||
|
@ -57,7 +47,7 @@ class AlreadyTakenException(Exception):
|
|||
pass
|
||||
|
||||
|
||||
class Question(ModelBase, BigVocabTaggableMixin, SearchMixin):
|
||||
class Question(ModelBase, BigVocabTaggableMixin):
|
||||
"""A support question."""
|
||||
|
||||
title = models.CharField(max_length=255)
|
||||
|
@ -369,10 +359,6 @@ class Question(ModelBase, BigVocabTaggableMixin, SearchMixin):
|
|||
cache.add(cache_key, tags, settings.CACHE_MEDIUM_TIMEOUT)
|
||||
return tags
|
||||
|
||||
@classmethod
|
||||
def get_mapping_type(cls):
|
||||
return QuestionMappingType
|
||||
|
||||
@classmethod
|
||||
def get_serializer(cls, serializer_type="full"):
|
||||
# Avoid circular import
|
||||
|
@ -698,171 +684,6 @@ class Question(ModelBase, BigVocabTaggableMixin, SearchMixin):
|
|||
return images
|
||||
|
||||
|
||||
@register_mapping_type
|
||||
class QuestionMappingType(SearchMappingType):
|
||||
seconds_ago_filter = "updated__gte"
|
||||
list_keys = [
|
||||
"topic",
|
||||
"product",
|
||||
"question_tag",
|
||||
"question_answer_content",
|
||||
"question_answer_creator",
|
||||
]
|
||||
|
||||
@classmethod
|
||||
def get_model(cls):
|
||||
return Question
|
||||
|
||||
@classmethod
|
||||
def get_query_fields(cls):
|
||||
return ["question_title", "question_content", "question_answer_content"]
|
||||
|
||||
@classmethod
|
||||
def get_localized_fields(cls):
|
||||
# This is the same list as `get_query_fields`, but it doesn't
|
||||
# have to be, which is why it is typed twice.
|
||||
return ["question_title", "question_content", "question_answer_content"]
|
||||
|
||||
@classmethod
|
||||
def get_mapping(cls):
|
||||
return {
|
||||
"properties": {
|
||||
"id": {"type": "long"},
|
||||
"model": {"type": "string", "index": "not_analyzed"},
|
||||
"url": {"type": "string", "index": "not_analyzed"},
|
||||
"indexed_on": {"type": "integer"},
|
||||
"created": {"type": "integer"},
|
||||
"updated": {"type": "integer"},
|
||||
"product": {"type": "string", "index": "not_analyzed"},
|
||||
"topic": {"type": "string", "index": "not_analyzed"},
|
||||
"question_title": {"type": "string", "analyzer": "snowball"},
|
||||
"question_content": {
|
||||
"type": "string",
|
||||
"analyzer": "snowball",
|
||||
# TODO: Stored because originally, this is the
|
||||
# only field we were excerpting on. Standardize
|
||||
# one way or the other.
|
||||
"store": "yes",
|
||||
"term_vector": "with_positions_offsets",
|
||||
},
|
||||
"question_answer_content": {"type": "string", "analyzer": "snowball"},
|
||||
"question_num_answers": {"type": "integer"},
|
||||
"question_is_solved": {"type": "boolean"},
|
||||
"question_is_locked": {"type": "boolean"},
|
||||
"question_is_archived": {"type": "boolean"},
|
||||
"question_has_answers": {"type": "boolean"},
|
||||
"question_has_helpful": {"type": "boolean"},
|
||||
"question_creator": {"type": "string", "index": "not_analyzed"},
|
||||
"question_answer_creator": {"type": "string", "index": "not_analyzed"},
|
||||
"question_num_votes": {"type": "integer"},
|
||||
"question_num_votes_past_week": {"type": "integer"},
|
||||
"question_tag": {"type": "string", "index": "not_analyzed"},
|
||||
"question_locale": {"type": "string", "index": "not_analyzed"},
|
||||
}
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def extract_document(cls, obj_id, obj=None):
|
||||
"""Extracts indexable attributes from a Question and its answers."""
|
||||
fields = [
|
||||
"id",
|
||||
"title",
|
||||
"content",
|
||||
"num_answers",
|
||||
"solution_id",
|
||||
"is_locked",
|
||||
"is_archived",
|
||||
"created",
|
||||
"updated",
|
||||
"num_votes_past_week",
|
||||
"locale",
|
||||
"product_id",
|
||||
"topic_id",
|
||||
"is_spam",
|
||||
]
|
||||
composed_fields = ["creator__username"]
|
||||
all_fields = fields + composed_fields
|
||||
|
||||
if obj is None:
|
||||
# Note: Need to keep this in sync with
|
||||
# tasks.update_question_vote_chunk.
|
||||
model = cls.get_model()
|
||||
obj = model.objects.values(*all_fields).get(pk=obj_id)
|
||||
else:
|
||||
fixed_obj = dict([(field, getattr(obj, field)) for field in fields])
|
||||
fixed_obj["creator__username"] = obj.creator.username
|
||||
obj = fixed_obj
|
||||
|
||||
if obj["is_spam"]:
|
||||
raise UnindexMeBro()
|
||||
|
||||
d = {}
|
||||
d["id"] = obj["id"]
|
||||
d["model"] = cls.get_mapping_type_name()
|
||||
|
||||
# We do this because get_absolute_url is an instance method
|
||||
# and we don't want to create an instance because it's a DB
|
||||
# hit and expensive. So we do it by hand. get_absolute_url
|
||||
# doesn't change much, so this is probably ok.
|
||||
d["url"] = reverse("questions.details", kwargs={"question_id": obj["id"]})
|
||||
|
||||
d["indexed_on"] = int(time.time())
|
||||
|
||||
d["created"] = int(time.mktime(obj["created"].timetuple()))
|
||||
d["updated"] = int(time.mktime(obj["updated"].timetuple()))
|
||||
|
||||
topics = Topic.objects.filter(id=obj["topic_id"])
|
||||
products = Product.objects.filter(id=obj["product_id"])
|
||||
d["topic"] = [t.slug for t in topics]
|
||||
d["product"] = [p.slug for p in products]
|
||||
|
||||
d["question_title"] = obj["title"]
|
||||
d["question_content"] = obj["content"]
|
||||
d["question_num_answers"] = obj["num_answers"]
|
||||
d["question_is_solved"] = bool(obj["solution_id"])
|
||||
d["question_is_locked"] = obj["is_locked"]
|
||||
d["question_is_archived"] = obj["is_archived"]
|
||||
d["question_has_answers"] = bool(obj["num_answers"])
|
||||
|
||||
d["question_creator"] = obj["creator__username"]
|
||||
d["question_num_votes"] = QuestionVote.objects.filter(question=obj["id"]).count()
|
||||
d["question_num_votes_past_week"] = obj["num_votes_past_week"]
|
||||
|
||||
d["question_tag"] = list(
|
||||
TaggedItem.tags_for(Question, Question(pk=obj_id)).values_list("name", flat=True)
|
||||
)
|
||||
|
||||
d["question_locale"] = obj["locale"]
|
||||
|
||||
answer_values = list(
|
||||
Answer.objects.filter(question=obj_id, is_spam=False).values_list(
|
||||
"content", "creator__username"
|
||||
)
|
||||
)
|
||||
|
||||
d["question_answer_content"] = [a[0] for a in answer_values]
|
||||
d["question_answer_creator"] = list(set(a[1] for a in answer_values))
|
||||
|
||||
if not answer_values:
|
||||
d["question_has_helpful"] = False
|
||||
else:
|
||||
d["question_has_helpful"] = (
|
||||
Answer.objects.filter(question=obj_id).filter(votes__helpful=True).exists()
|
||||
)
|
||||
|
||||
return d
|
||||
|
||||
|
||||
register_for_indexing("questions", Question)
|
||||
register_for_indexing(
|
||||
"questions",
|
||||
TaggedItem,
|
||||
instance_to_indexee=(
|
||||
lambda i: (i.content_object if isinstance(i.content_object, Question) else None)
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
class QuestionMetaData(ModelBase):
|
||||
"""Metadata associated with a support question."""
|
||||
|
||||
|
@ -933,7 +754,7 @@ class QuestionLocale(ModelBase):
|
|||
verbose_name = "AAQ enabled locale"
|
||||
|
||||
|
||||
class Answer(ModelBase, SearchMixin):
|
||||
class Answer(ModelBase):
|
||||
"""An answer to a support question."""
|
||||
|
||||
question = models.ForeignKey("Question", on_delete=models.CASCADE, related_name="answers")
|
||||
|
@ -1153,10 +974,6 @@ class Answer(ModelBase, SearchMixin):
|
|||
cache.add(cache_key, images, settings.CACHE_MEDIUM_TIMEOUT)
|
||||
return images
|
||||
|
||||
@classmethod
|
||||
def get_mapping_type(cls):
|
||||
return AnswerMetricsMappingType
|
||||
|
||||
@classmethod
|
||||
def get_serializer(cls, serializer_type="full"):
|
||||
# Avoid circular import
|
||||
|
@ -1177,113 +994,6 @@ class Answer(ModelBase, SearchMixin):
|
|||
self.save()
|
||||
|
||||
|
||||
@register_mapping_type
|
||||
class AnswerMetricsMappingType(SearchMappingType):
|
||||
seconds_ago_filter = "updated__gte"
|
||||
list_keys = ["product"]
|
||||
|
||||
@classmethod
|
||||
def get_model(cls):
|
||||
return Answer
|
||||
|
||||
@classmethod
|
||||
def get_index_group(cls):
|
||||
return "metrics"
|
||||
|
||||
@classmethod
|
||||
def get_mapping(cls):
|
||||
return {
|
||||
"properties": {
|
||||
"id": {"type": "long"},
|
||||
"model": {"type": "string", "index": "not_analyzed"},
|
||||
"url": {"type": "string", "index": "not_analyzed"},
|
||||
"indexed_on": {"type": "integer"},
|
||||
"created": {"type": "date"},
|
||||
"locale": {"type": "string", "index": "not_analyzed"},
|
||||
"product": {"type": "string", "index": "not_analyzed"},
|
||||
"is_solution": {"type": "boolean"},
|
||||
"creator_id": {"type": "long"},
|
||||
"by_asker": {"type": "boolean"},
|
||||
"helpful_count": {"type": "integer"},
|
||||
"unhelpful_count": {"type": "integer"},
|
||||
}
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def extract_document(cls, obj_id, obj=None):
|
||||
"""Extracts indexable attributes from an Answer."""
|
||||
fields = ["id", "created", "creator_id", "question_id"]
|
||||
composed_fields = [
|
||||
"question__locale",
|
||||
"question__solution_id",
|
||||
"question__creator_id",
|
||||
"question__product_id",
|
||||
]
|
||||
all_fields = fields + composed_fields
|
||||
|
||||
if obj is None:
|
||||
model = cls.get_model()
|
||||
obj_dict = model.objects.values(*all_fields).get(pk=obj_id)
|
||||
else:
|
||||
obj_dict = dict([(field, getattr(obj, field)) for field in fields])
|
||||
obj_dict["question__locale"] = obj.question.locale
|
||||
obj_dict["question__solution_id"] = obj.question.solution_id
|
||||
obj_dict["question__creator_id"] = obj.question.creator_id
|
||||
obj_dict["question__product_id"] = obj.question.product_id
|
||||
|
||||
d = {}
|
||||
d["id"] = obj_dict["id"]
|
||||
d["model"] = cls.get_mapping_type_name()
|
||||
|
||||
# We do this because get_absolute_url is an instance method
|
||||
# and we don't want to create an instance because it's a DB
|
||||
# hit and expensive. So we do it by hand. get_absolute_url
|
||||
# doesn't change much, so this is probably ok.
|
||||
url = reverse("questions.details", kwargs={"question_id": obj_dict["question_id"]})
|
||||
d["url"] = urlparams(url, hash="answer-%s" % obj_dict["id"])
|
||||
|
||||
d["indexed_on"] = int(time.time())
|
||||
|
||||
d["created"] = obj_dict["created"]
|
||||
|
||||
d["locale"] = obj_dict["question__locale"]
|
||||
d["is_solution"] = obj_dict["id"] == obj_dict["question__solution_id"]
|
||||
d["creator_id"] = obj_dict["creator_id"]
|
||||
d["by_asker"] = obj_dict["creator_id"] == obj_dict["question__creator_id"]
|
||||
|
||||
products = Product.objects.filter(id=obj_dict["question__product_id"])
|
||||
d["product"] = [p.slug for p in products]
|
||||
|
||||
related_votes = AnswerVote.objects.filter(answer_id=obj_dict["id"])
|
||||
d["helpful_count"] = related_votes.filter(helpful=True).count()
|
||||
d["unhelpful_count"] = related_votes.filter(helpful=False).count()
|
||||
|
||||
return d
|
||||
|
||||
|
||||
register_for_indexing("answers", Answer)
|
||||
# This below is needed to update the is_solution field on the answer.
|
||||
register_for_indexing("answers", Question, instance_to_indexee=(lambda i: i.solution))
|
||||
|
||||
|
||||
register_for_indexing("questions", Answer, instance_to_indexee=lambda a: a.question)
|
||||
|
||||
|
||||
# This below is needed to update the is_solution field on the answer.
|
||||
def reindex_questions_answers(sender, instance, **kw):
|
||||
"""When a question is saved, we need to reindex it's answers.
|
||||
|
||||
This is needed because the solution may have changed."""
|
||||
if instance.id:
|
||||
answer_ids = instance.answers.all().values_list("id", flat=True)
|
||||
index_task.delay(to_class_path(AnswerMetricsMappingType), list(answer_ids))
|
||||
|
||||
|
||||
post_save.connect(
|
||||
reindex_questions_answers, sender=Question, dispatch_uid="questions_reindex_answers"
|
||||
)
|
||||
|
||||
|
||||
def user_pre_save(sender, instance, **kw):
|
||||
"""When a user's username is changed, we must reindex the questions
|
||||
they participated in.
|
||||
|
@ -1319,9 +1029,6 @@ class QuestionVote(ModelBase):
|
|||
VoteMetadata.objects.create(vote=self, key=key, value=value[:VOTE_METADATA_MAX_LENGTH])
|
||||
|
||||
|
||||
register_for_indexing("questions", QuestionVote, instance_to_indexee=lambda v: v.question)
|
||||
|
||||
|
||||
class AnswerVote(ModelBase):
|
||||
"""Helpful or Not Helpful vote on Answer."""
|
||||
|
||||
|
@ -1337,13 +1044,6 @@ class AnswerVote(ModelBase):
|
|||
VoteMetadata.objects.create(vote=self, key=key, value=value[:VOTE_METADATA_MAX_LENGTH])
|
||||
|
||||
|
||||
# TODO: We only need to update the helpful bit. It's possible
|
||||
# we could ignore all AnswerVotes that aren't helpful and if
|
||||
# they're marked as helpful, then update the index. Look into
|
||||
# this.
|
||||
register_for_indexing("questions", AnswerVote, instance_to_indexee=lambda v: v.answer.question)
|
||||
|
||||
|
||||
class VoteMetadata(ModelBase):
|
||||
"""Metadata for question and answer votes."""
|
||||
|
||||
|
|
|
@ -1,250 +0,0 @@
|
|||
from datetime import datetime, timedelta
|
||||
|
||||
from nose.tools import eq_
|
||||
from pyquery import PyQuery as pq
|
||||
|
||||
from kitsune.products.tests import ProductFactory
|
||||
from kitsune.questions.models import QuestionMappingType, AnswerMetricsMappingType
|
||||
from kitsune.questions.tests import (
|
||||
QuestionFactory,
|
||||
AnswerFactory,
|
||||
AnswerVoteFactory,
|
||||
QuestionVoteFactory,
|
||||
)
|
||||
from kitsune.search.tests.test_es import ElasticTestCase
|
||||
from kitsune.sumo.tests import LocalizingClient
|
||||
from kitsune.sumo.urlresolvers import reverse
|
||||
from kitsune.users.models import Profile
|
||||
from kitsune.users.tests import UserFactory
|
||||
|
||||
|
||||
class QuestionUpdateTests(ElasticTestCase):
|
||||
def test_added(self):
|
||||
search = QuestionMappingType.search()
|
||||
|
||||
# Create a question--that adds one document to the index.
|
||||
q = QuestionFactory(title="Does this test work?")
|
||||
self.refresh()
|
||||
eq_(search.count(), 1)
|
||||
eq_(search.query(question_title__match="test").count(), 1)
|
||||
|
||||
# No answer exist, so none should be searchable.
|
||||
eq_(search.query(question_answer_content__match="only").count(), 0)
|
||||
|
||||
# Create an answer for the question. It should be searchable now.
|
||||
AnswerFactory(content="There's only one way to find out!", question=q)
|
||||
self.refresh()
|
||||
eq_(search.query(question_answer_content__match="only").count(), 1)
|
||||
|
||||
# Make sure that there's only one question document in the index--creating an answer
|
||||
# should have updated the existing one.
|
||||
eq_(search.count(), 1)
|
||||
|
||||
def test_question_no_answers_deleted(self):
|
||||
search = QuestionMappingType.search()
|
||||
|
||||
q = QuestionFactory(title="Does this work?")
|
||||
self.refresh()
|
||||
eq_(search.query(question_title__match="work").count(), 1)
|
||||
|
||||
q.delete()
|
||||
self.refresh()
|
||||
eq_(search.query(question_title__match="work").count(), 0)
|
||||
|
||||
def test_question_one_answer_deleted(self):
|
||||
search = QuestionMappingType.search()
|
||||
|
||||
q = QuestionFactory(title="are model makers the new pink?")
|
||||
a = AnswerFactory(content="yes.", question=q)
|
||||
self.refresh()
|
||||
|
||||
# Question and its answers are a single document--so the index count should be only 1.
|
||||
eq_(search.query(question_title__match="pink").count(), 1)
|
||||
|
||||
# After deleting the answer, the question document should remain.
|
||||
a.delete()
|
||||
self.refresh()
|
||||
eq_(search.query(question_title__match="pink").count(), 1)
|
||||
|
||||
# Delete the question and it should be removed from the index.
|
||||
q.delete()
|
||||
self.refresh()
|
||||
eq_(search.query(question_title__match="pink").count(), 0)
|
||||
|
||||
def test_question_questionvote(self):
|
||||
search = QuestionMappingType.search()
|
||||
|
||||
# Create a question and verify it doesn't show up in a
|
||||
# query for num_votes__gt=0.
|
||||
q = QuestionFactory(title="model makers will inherit the earth")
|
||||
self.refresh()
|
||||
eq_(search.filter(question_num_votes__gt=0).count(), 0)
|
||||
|
||||
# Add a QuestionVote--it should show up now.
|
||||
QuestionVoteFactory(question=q)
|
||||
self.refresh()
|
||||
eq_(search.filter(question_num_votes__gt=0).count(), 1)
|
||||
|
||||
def test_questions_tags(self):
|
||||
"""Make sure that adding tags to a Question causes it to
|
||||
refresh the index.
|
||||
|
||||
"""
|
||||
tag = "hiphop"
|
||||
eq_(QuestionMappingType.search().filter(question_tag=tag).count(), 0)
|
||||
q = QuestionFactory()
|
||||
self.refresh()
|
||||
eq_(QuestionMappingType.search().filter(question_tag=tag).count(), 0)
|
||||
q.tags.add(tag)
|
||||
self.refresh()
|
||||
eq_(QuestionMappingType.search().filter(question_tag=tag).count(), 1)
|
||||
q.tags.remove(tag)
|
||||
self.refresh()
|
||||
eq_(QuestionMappingType.search().filter(question_tag=tag).count(), 0)
|
||||
|
||||
def test_question_is_unindexed_on_creator_delete(self):
|
||||
search = QuestionMappingType.search()
|
||||
|
||||
q = QuestionFactory(title="Does this work?")
|
||||
self.refresh()
|
||||
eq_(search.query(question_title__match="work").count(), 1)
|
||||
|
||||
q.creator.delete()
|
||||
self.refresh()
|
||||
eq_(search.query(question_title__match="work").count(), 0)
|
||||
|
||||
def test_question_is_reindexed_on_username_change(self):
|
||||
search = QuestionMappingType.search()
|
||||
|
||||
u = UserFactory(username="dexter")
|
||||
|
||||
QuestionFactory(creator=u, title="Hello")
|
||||
AnswerFactory(creator=u, content="I love you")
|
||||
self.refresh()
|
||||
eq_(search.query(question_title__match="hello")[0]["question_creator"], "dexter")
|
||||
query = search.query(question_answer_content__match="love")
|
||||
eq_(query[0]["question_answer_creator"], ["dexter"])
|
||||
|
||||
# Change the username and verify the index.
|
||||
u.username = "walter"
|
||||
u.save()
|
||||
self.refresh()
|
||||
eq_(search.query(question_title__match="hello")[0]["question_creator"], "walter")
|
||||
query = search.query(question_answer_content__match="love")
|
||||
eq_(query[0]["question_answer_creator"], ["walter"])
|
||||
|
||||
def test_question_spam_is_unindexed(self):
|
||||
search = QuestionMappingType.search()
|
||||
|
||||
q = QuestionFactory(title="I am spam")
|
||||
self.refresh()
|
||||
eq_(search.query(question_title__match="spam").count(), 1)
|
||||
|
||||
q.is_spam = True
|
||||
q.save()
|
||||
self.refresh()
|
||||
eq_(search.query(question_title__match="spam").count(), 0)
|
||||
|
||||
def test_answer_spam_is_unindexed(self):
|
||||
search = QuestionMappingType.search()
|
||||
|
||||
a = AnswerFactory(content="I am spam")
|
||||
self.refresh()
|
||||
eq_(search.query(question_answer_content__match="spam").count(), 1)
|
||||
|
||||
a.is_spam = True
|
||||
a.save()
|
||||
self.refresh()
|
||||
eq_(search.query(question_answer_content__match="spam").count(), 0)
|
||||
|
||||
|
||||
class QuestionSearchTests(ElasticTestCase):
|
||||
"""Tests about searching for questions"""
|
||||
|
||||
def test_case_insensitive_search(self):
|
||||
"""Ensure the default searcher is case insensitive."""
|
||||
q = QuestionFactory(title="lolrus", content="I am the lolrus.")
|
||||
AnswerVoteFactory(answer__question=q)
|
||||
self.refresh()
|
||||
# This is an AND operation
|
||||
result = QuestionMappingType.search().query(
|
||||
question_title__match="LOLRUS", question_content__match="LOLRUS"
|
||||
)
|
||||
assert result.count() > 0
|
||||
|
||||
|
||||
class AnswerMetricsTests(ElasticTestCase):
|
||||
def test_add_and_delete(self):
|
||||
"""Adding an answer should add it to the index.
|
||||
|
||||
Deleting should delete it.
|
||||
"""
|
||||
a = AnswerFactory()
|
||||
self.refresh()
|
||||
eq_(AnswerMetricsMappingType.search().count(), 1)
|
||||
|
||||
a.delete()
|
||||
self.refresh()
|
||||
eq_(AnswerMetricsMappingType.search().count(), 0)
|
||||
|
||||
def test_data_in_index(self):
|
||||
"""Verify the data we are indexing."""
|
||||
p = ProductFactory()
|
||||
q = QuestionFactory(locale="pt-BR", product=p)
|
||||
a = AnswerFactory(question=q)
|
||||
|
||||
self.refresh()
|
||||
|
||||
eq_(AnswerMetricsMappingType.search().count(), 1)
|
||||
data = AnswerMetricsMappingType.search()[0]
|
||||
eq_(data["locale"], q.locale)
|
||||
eq_(data["product"], [p.slug])
|
||||
eq_(data["creator_id"], a.creator_id)
|
||||
eq_(data["is_solution"], False)
|
||||
eq_(data["by_asker"], False)
|
||||
|
||||
# Mark as solution and verify
|
||||
q.solution = a
|
||||
q.save()
|
||||
|
||||
self.refresh()
|
||||
data = AnswerMetricsMappingType.search()[0]
|
||||
eq_(data["is_solution"], True)
|
||||
|
||||
# Make the answer creator to be the question creator and verify.
|
||||
a.creator = q.creator
|
||||
a.save()
|
||||
|
||||
self.refresh()
|
||||
data = AnswerMetricsMappingType.search()[0]
|
||||
eq_(data["by_asker"], True)
|
||||
|
||||
|
||||
class SupportForumTopContributorsTests(ElasticTestCase):
|
||||
client_class = LocalizingClient
|
||||
|
||||
def test_top_contributors(self):
|
||||
# There should be no top contributors since there are no answers.
|
||||
response = self.client.get(reverse("questions.list", args=["all"]))
|
||||
eq_(200, response.status_code)
|
||||
doc = pq(response.content)
|
||||
eq_(0, len(doc("#top-contributors ol li")))
|
||||
|
||||
# Add an answer, we now have a top conributor.
|
||||
a = AnswerFactory()
|
||||
self.refresh()
|
||||
response = self.client.get(reverse("questions.list", args=["all"]))
|
||||
eq_(200, response.status_code)
|
||||
doc = pq(response.content)
|
||||
lis = doc("#top-contributors ol li")
|
||||
eq_(1, len(lis))
|
||||
eq_(Profile.objects.get(user=a.creator).display_name, lis[0].text)
|
||||
|
||||
# Make answer 91 days old. There should no be top contributors.
|
||||
a.created = datetime.now() - timedelta(days=91)
|
||||
a.save()
|
||||
self.refresh()
|
||||
response = self.client.get(reverse("questions.list", args=["all"]))
|
||||
eq_(200, response.status_code)
|
||||
doc = pq(response.content)
|
||||
eq_(0, len(doc("#top-contributors ol li")))
|
|
@ -30,7 +30,7 @@ from kitsune.questions.tests import (
|
|||
TestCaseBase,
|
||||
tags_eq,
|
||||
)
|
||||
from kitsune.search.tests.test_es import ElasticTestCase
|
||||
from kitsune.search.v2.tests import Elastic7TestCase
|
||||
from kitsune.sumo import googleanalytics
|
||||
from kitsune.sumo.tests import TestCase
|
||||
from kitsune.tags.tests import TagFactory
|
||||
|
@ -506,7 +506,9 @@ class AddExistingTagTests(TestCaseBase):
|
|||
add_existing_tag("nonexistent tag", self.untagged_question.tags)
|
||||
|
||||
|
||||
class OldQuestionsArchiveTest(ElasticTestCase):
|
||||
class OldQuestionsArchiveTest(Elastic7TestCase):
|
||||
search_tests = True
|
||||
|
||||
def test_archive_old_questions(self):
|
||||
last_updated = datetime.now() - timedelta(days=100)
|
||||
|
||||
|
|
|
@ -3,13 +3,12 @@ import json
|
|||
import random
|
||||
from datetime import datetime, timedelta
|
||||
from string import ascii_letters
|
||||
from unittest import mock
|
||||
|
||||
from django.conf import settings
|
||||
from django.contrib.auth.models import User
|
||||
from django.core import mail
|
||||
from django.core.cache import cache
|
||||
|
||||
from unittest import mock
|
||||
from nose.tools import eq_
|
||||
from pyquery import PyQuery as pq
|
||||
from taggit.models import Tag
|
||||
|
@ -26,7 +25,7 @@ from kitsune.questions.tests import (
|
|||
tags_eq,
|
||||
)
|
||||
from kitsune.questions.views import NO_TAG, UNAPPROVED_TAG
|
||||
from kitsune.search.tests import ElasticTestCase
|
||||
from kitsune.search.v2.tests import Elastic7TestCase
|
||||
from kitsune.sumo.templatetags.jinja_helpers import urlparams
|
||||
from kitsune.sumo.tests import (
|
||||
LocalizingClient,
|
||||
|
@ -1454,7 +1453,9 @@ class ProductForumTemplateTestCase(TestCaseBase):
|
|||
assert openbadges.title not in product_list_html
|
||||
|
||||
|
||||
class RelatedThingsTestCase(ElasticTestCase):
|
||||
class RelatedThingsTestCase(Elastic7TestCase):
|
||||
search_tests = True
|
||||
|
||||
def setUp(self):
|
||||
super(RelatedThingsTestCase, self).setUp()
|
||||
self.question = QuestionFactory(
|
||||
|
@ -1484,7 +1485,6 @@ class RelatedThingsTestCase(ElasticTestCase):
|
|||
AnswerVoteFactory(answer=a3, helpful=True)
|
||||
|
||||
cache.clear()
|
||||
self.refresh()
|
||||
|
||||
response = get(self.client, "questions.details", args=[self.question.id])
|
||||
doc = pq(response.content)
|
||||
|
@ -1502,7 +1502,6 @@ class RelatedThingsTestCase(ElasticTestCase):
|
|||
d1.save()
|
||||
|
||||
cache.clear()
|
||||
self.refresh()
|
||||
|
||||
response = get(self.client, "questions.details", args=[self.question.id])
|
||||
doc = pq(response.content)
|
||||
|
|
|
@ -16,17 +16,15 @@ from kitsune.questions.tests import (
|
|||
TestCaseBase,
|
||||
)
|
||||
from kitsune.questions.views import parse_troubleshooting
|
||||
from kitsune.search.tests.test_es import ElasticTestCase
|
||||
from kitsune.search.v2.tests import Elastic7TestCase
|
||||
from kitsune.sumo.templatetags.jinja_helpers import urlparams
|
||||
from kitsune.sumo.tests import LocalizingClient, eq_msg, get, template_used
|
||||
from kitsune.sumo.urlresolvers import reverse
|
||||
from kitsune.users.tests import UserFactory, add_permission
|
||||
from kitsune.wiki.tests import DocumentFactory, RevisionFactory
|
||||
|
||||
|
||||
# Note:
|
||||
# Tests using the ElasticTestCase are not being run bc of this line: `-a '!search_tests'`
|
||||
class AAQSearchTests(ElasticTestCase):
|
||||
class AAQSearchTests(Elastic7TestCase):
|
||||
search_tests = True
|
||||
client_class = LocalizingClient
|
||||
|
||||
def test_bleaching(self):
|
||||
|
@ -66,11 +64,6 @@ class AAQSearchTests(ElasticTestCase):
|
|||
TopicFactory(title="Fix problems", slug="fix-problems", product=p)
|
||||
q = QuestionFactory(product=p, title="CupcakesQuestion cupcakes")
|
||||
|
||||
d = DocumentFactory(title="CupcakesKB cupcakes", category=10)
|
||||
d.products.add(p)
|
||||
|
||||
RevisionFactory(document=d, is_approved=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
url = urlparams(
|
||||
|
@ -82,22 +75,17 @@ class AAQSearchTests(ElasticTestCase):
|
|||
eq_(200, response.status_code)
|
||||
|
||||
assert b"CupcakesQuestion" in response.content
|
||||
assert b"CupcakesKB" in response.content
|
||||
|
||||
# Verify that archived articles and questions aren't shown...
|
||||
# Archive both and they shouldn't appear anymore.
|
||||
q.is_archived = True
|
||||
q.save()
|
||||
d.is_archived = True
|
||||
d.save()
|
||||
|
||||
self.refresh()
|
||||
|
||||
response = self.client.get(url, follow=True)
|
||||
eq_(200, response.status_code)
|
||||
|
||||
assert b"CupcakesQuestion" not in response.content
|
||||
assert b"CupcakesKB" not in response.content
|
||||
|
||||
def test_search_suggestion_questions_locale(self):
|
||||
"""Verifies the right languages show up in search suggestions."""
|
||||
|
@ -683,7 +671,8 @@ class TestRateLimiting(TestCaseBase):
|
|||
eq_(4, Answer.objects.count())
|
||||
|
||||
|
||||
class TestStats(ElasticTestCase):
|
||||
class TestStats(Elastic7TestCase):
|
||||
search_tests = True
|
||||
client_class = LocalizingClient
|
||||
|
||||
def test_stats(self):
|
||||
|
|
|
@ -1,383 +0,0 @@
|
|||
import logging
|
||||
import time
|
||||
from datetime import datetime
|
||||
|
||||
import requests
|
||||
|
||||
from django.conf import settings
|
||||
from django.contrib import admin
|
||||
from django.core.exceptions import PermissionDenied
|
||||
from django.http import HttpResponseRedirect, Http404
|
||||
from django.shortcuts import render
|
||||
|
||||
from kitsune.search import synonym_utils
|
||||
from kitsune.search.es_utils import (
|
||||
get_doctype_stats,
|
||||
get_indexes,
|
||||
delete_index,
|
||||
ES_EXCEPTIONS,
|
||||
get_indexable,
|
||||
CHUNK_SIZE,
|
||||
recreate_indexes,
|
||||
write_index,
|
||||
read_index,
|
||||
all_read_indexes,
|
||||
all_write_indexes,
|
||||
)
|
||||
from kitsune.search.models import Record, get_mapping_types, Synonym
|
||||
from kitsune.search.tasks import index_chunk_task, update_synonyms_task
|
||||
from kitsune.search.utils import chunked, to_class_path
|
||||
|
||||
|
||||
log = logging.getLogger("k.es")
|
||||
|
||||
|
||||
def handle_reset(request):
|
||||
"""Resets records"""
|
||||
for rec in Record.objects.outstanding():
|
||||
rec.mark_fail("Cancelled.")
|
||||
|
||||
return HttpResponseRedirect(request.path)
|
||||
|
||||
|
||||
class DeleteError(Exception):
|
||||
pass
|
||||
|
||||
|
||||
def create_batch_id():
|
||||
"""Returns a batch_id"""
|
||||
# TODO: This is silly, but it's a good enough way to distinguish
|
||||
# between batches by looking at a Record. This is just over the
|
||||
# number of seconds in a day.
|
||||
return str(int(time.time()))[-6:]
|
||||
|
||||
|
||||
def handle_delete(request):
|
||||
"""Deletes an index"""
|
||||
index_to_delete = request.POST.get("delete_index")
|
||||
es_indexes = [name for (name, count) in get_indexes()]
|
||||
|
||||
# Rule 1: Has to start with the ES_INDEX_PREFIX.
|
||||
if not index_to_delete.startswith(settings.ES_INDEX_PREFIX):
|
||||
raise DeleteError('"%s" is not a valid index name.' % index_to_delete)
|
||||
|
||||
# Rule 2: Must be an existing index.
|
||||
if index_to_delete not in es_indexes:
|
||||
raise DeleteError('"%s" does not exist.' % index_to_delete)
|
||||
|
||||
# Rule 3: Don't delete the default read index.
|
||||
# TODO: When the critical index exists, this should be "Don't
|
||||
# delete the critical read index."
|
||||
if index_to_delete == read_index("default"):
|
||||
raise DeleteError('"%s" is the default read index.' % index_to_delete)
|
||||
|
||||
# The index is ok to delete
|
||||
delete_index(index_to_delete)
|
||||
|
||||
return HttpResponseRedirect(request.path)
|
||||
|
||||
|
||||
class ReindexError(Exception):
|
||||
pass
|
||||
|
||||
|
||||
def reindex(mapping_type_names):
|
||||
"""Reindex all instances of a given mapping type with celery tasks
|
||||
|
||||
:arg mapping_type_names: list of mapping types to reindex
|
||||
|
||||
"""
|
||||
outstanding = Record.objects.outstanding().count()
|
||||
if outstanding > 0:
|
||||
raise ReindexError("There are %s outstanding chunks." % outstanding)
|
||||
|
||||
batch_id = create_batch_id()
|
||||
|
||||
# Break up all the things we want to index into chunks. This
|
||||
# chunkifies by class then by chunk size.
|
||||
chunks = []
|
||||
for cls, indexable in get_indexable(mapping_types=mapping_type_names):
|
||||
chunks.extend((cls, chunk) for chunk in chunked(indexable, CHUNK_SIZE))
|
||||
|
||||
for cls, id_list in chunks:
|
||||
index = cls.get_index()
|
||||
chunk_name = "Indexing: %s %d -> %d" % (
|
||||
cls.get_mapping_type_name(),
|
||||
id_list[0],
|
||||
id_list[-1],
|
||||
)
|
||||
rec = Record.objects.create(batch_id=batch_id, name=chunk_name)
|
||||
index_chunk_task.delay(index, batch_id, rec.id, (to_class_path(cls), id_list))
|
||||
|
||||
|
||||
def handle_recreate_index(request):
|
||||
"""Deletes an index, recreates it, and reindexes it."""
|
||||
groups = [
|
||||
name.replace("check_", "")
|
||||
for name in list(request.POST.keys())
|
||||
if name.startswith("check_")
|
||||
]
|
||||
|
||||
indexes = [write_index(group) for group in groups]
|
||||
recreate_indexes(indexes=indexes)
|
||||
|
||||
mapping_types_names = [
|
||||
mt.get_mapping_type_name() for mt in get_mapping_types() if mt.get_index_group() in groups
|
||||
]
|
||||
reindex(mapping_types_names)
|
||||
|
||||
return HttpResponseRedirect(request.path)
|
||||
|
||||
|
||||
def handle_reindex(request):
|
||||
"""Caculates and kicks off indexing tasks"""
|
||||
mapping_type_names = [
|
||||
name.replace("check_", "")
|
||||
for name in list(request.POST.keys())
|
||||
if name.startswith("check_")
|
||||
]
|
||||
|
||||
reindex(mapping_type_names)
|
||||
|
||||
return HttpResponseRedirect(request.path)
|
||||
|
||||
|
||||
def search(request):
|
||||
"""Render the admin view containing search tools"""
|
||||
if not request.user.has_perm("search.reindex"):
|
||||
raise PermissionDenied
|
||||
|
||||
error_messages = []
|
||||
stats = {}
|
||||
|
||||
if "reset" in request.POST:
|
||||
try:
|
||||
return handle_reset(request)
|
||||
except ReindexError as e:
|
||||
error_messages.append("Error: %s" % e.message)
|
||||
|
||||
if "reindex" in request.POST:
|
||||
try:
|
||||
return handle_reindex(request)
|
||||
except ReindexError as e:
|
||||
error_messages.append("Error: %s" % e.message)
|
||||
|
||||
if "recreate_index" in request.POST:
|
||||
try:
|
||||
return handle_recreate_index(request)
|
||||
except ReindexError as e:
|
||||
error_messages.append("Error: %s" % e.message)
|
||||
|
||||
if "delete_index" in request.POST:
|
||||
try:
|
||||
return handle_delete(request)
|
||||
except DeleteError as e:
|
||||
error_messages.append("Error: %s" % e.message)
|
||||
except ES_EXCEPTIONS as e:
|
||||
error_messages.append("Error: {0}".format(repr(e)))
|
||||
|
||||
stats = None
|
||||
write_stats = None
|
||||
es_deets = None
|
||||
indexes = []
|
||||
|
||||
try:
|
||||
# TODO: SUMO has a single ES_URL and that's the ZLB and does
|
||||
# the balancing. If that ever changes and we have multiple
|
||||
# ES_URLs, then this should get fixed.
|
||||
es_deets = requests.get(settings.ES_URLS[0]).json()
|
||||
except requests.exceptions.RequestException:
|
||||
pass
|
||||
|
||||
stats = {}
|
||||
for index in all_read_indexes():
|
||||
try:
|
||||
stats[index] = get_doctype_stats(index)
|
||||
except ES_EXCEPTIONS:
|
||||
stats[index] = None
|
||||
|
||||
write_stats = {}
|
||||
for index in all_write_indexes():
|
||||
try:
|
||||
write_stats[index] = get_doctype_stats(index)
|
||||
except ES_EXCEPTIONS:
|
||||
write_stats[index] = None
|
||||
|
||||
try:
|
||||
indexes = get_indexes()
|
||||
indexes.sort(key=lambda m: m[0])
|
||||
except ES_EXCEPTIONS as e:
|
||||
error_messages.append("Error: {0}".format(repr(e)))
|
||||
|
||||
recent_records = Record.objects.all()[:100]
|
||||
outstanding_records = Record.objects.outstanding()
|
||||
|
||||
index_groups = set(settings.ES_INDEXES.keys())
|
||||
index_groups |= set(settings.ES_WRITE_INDEXES.keys())
|
||||
|
||||
index_group_data = [[group, read_index(group), write_index(group)] for group in index_groups]
|
||||
|
||||
return render(
|
||||
request,
|
||||
"admin/search_maintenance.html",
|
||||
{
|
||||
"title": "Search",
|
||||
"es_deets": es_deets,
|
||||
"doctype_stats": stats,
|
||||
"doctype_write_stats": write_stats,
|
||||
"indexes": indexes,
|
||||
"index_groups": index_groups,
|
||||
"index_group_data": index_group_data,
|
||||
"read_indexes": all_read_indexes,
|
||||
"write_indexes": all_write_indexes,
|
||||
"error_messages": error_messages,
|
||||
"recent_records": recent_records,
|
||||
"outstanding_records": outstanding_records,
|
||||
"now": datetime.now(),
|
||||
"read_index": read_index,
|
||||
"write_index": write_index,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
admin.site.register_view(path="search-maintenance", view=search, name="Search - Index Maintenance")
|
||||
|
||||
|
||||
def _fix_results(results):
|
||||
"""Fixes up the S results for better templating
|
||||
|
||||
1. extract the results_dict from the DefaultMappingType
|
||||
and returns that as a dict
|
||||
2. turns datestamps into Python datetime objects
|
||||
|
||||
Note: This abuses ElasticUtils DefaultMappingType by using
|
||||
the private _results_dict.
|
||||
|
||||
"""
|
||||
results = [obj._results_dict for obj in results]
|
||||
for obj in results:
|
||||
# Convert datestamps (which are in seconds since epoch) to
|
||||
# Python datetime objects.
|
||||
for key in ("indexed_on", "created", "updated"):
|
||||
if key in obj and not isinstance(obj[key], datetime):
|
||||
obj[key] = datetime.fromtimestamp(int(obj[key]))
|
||||
return results
|
||||
|
||||
|
||||
def index_view(request):
|
||||
requested_bucket = request.GET.get("bucket", "")
|
||||
requested_id = request.GET.get("id", "")
|
||||
last_20_by_bucket = None
|
||||
data = None
|
||||
|
||||
bucket_to_model = dict([(cls.get_mapping_type_name(), cls) for cls in get_mapping_types()])
|
||||
|
||||
if requested_bucket and requested_id:
|
||||
# Nix whitespace because I keep accidentally picking up spaces
|
||||
# when I copy and paste.
|
||||
requested_id = requested_id.strip()
|
||||
|
||||
# The user wants to see a specific item in the index, so we
|
||||
# attempt to fetch it from the index and show that
|
||||
# specifically.
|
||||
if requested_bucket not in bucket_to_model:
|
||||
raise Http404
|
||||
|
||||
cls = bucket_to_model[requested_bucket]
|
||||
data = list(cls.search().filter(id=requested_id))
|
||||
if not data:
|
||||
raise Http404
|
||||
data = _fix_results(data)[0]
|
||||
|
||||
else:
|
||||
# Create a list of (class, list-of-dicts) showing us the most
|
||||
# recently indexed items for each bucket. We only display the
|
||||
# id, title and indexed_on fields, so only pull those back from
|
||||
# ES.
|
||||
last_20_by_bucket = [
|
||||
(cls_name, _fix_results(cls.search().order_by("-indexed_on")[:20]))
|
||||
for cls_name, cls in list(bucket_to_model.items())
|
||||
]
|
||||
|
||||
return render(
|
||||
request,
|
||||
"admin/search_index.html",
|
||||
{
|
||||
"title": "Index Browsing",
|
||||
"buckets": [cls_name for cls_name, cls in list(bucket_to_model.items())],
|
||||
"last_20_by_bucket": last_20_by_bucket,
|
||||
"requested_bucket": requested_bucket,
|
||||
"requested_id": requested_id,
|
||||
"requested_data": data,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
admin.site.register_view(path="search-index", view=index_view, name="Search - Index Browsing")
|
||||
|
||||
|
||||
class SynonymAdmin(admin.ModelAdmin):
|
||||
list_display = ("id", "from_words", "to_words")
|
||||
list_display_links = ("id",)
|
||||
list_editable = ("from_words", "to_words")
|
||||
ordering = ("from_words", "id")
|
||||
|
||||
|
||||
admin.site.register(Synonym, SynonymAdmin)
|
||||
|
||||
|
||||
def synonym_editor(request):
|
||||
parse_errors = []
|
||||
all_synonyms = Synonym.objects.all()
|
||||
|
||||
if "sync_synonyms" in request.POST:
|
||||
# This is a task. Normally we would call tasks asyncronously, right?
|
||||
# In this case, since it runs quickly and is in the admin interface,
|
||||
# the advantage of it being run in the request/response cycle
|
||||
# outweight the delay in responding. If this becomes a problem
|
||||
# we should make a better UI and make this .delay() again.
|
||||
update_synonyms_task()
|
||||
return HttpResponseRedirect(request.path)
|
||||
|
||||
synonyms_text = request.POST.get("synonyms_text")
|
||||
if synonyms_text is not None:
|
||||
db_syns = set((s.from_words, s.to_words) for s in all_synonyms)
|
||||
|
||||
try:
|
||||
post_syns = set(synonym_utils.parse_synonyms(synonyms_text))
|
||||
except synonym_utils.SynonymParseError as e:
|
||||
parse_errors = e.errors
|
||||
else:
|
||||
syns_to_add = post_syns - db_syns
|
||||
syns_to_remove = db_syns - post_syns
|
||||
|
||||
for (from_words, to_words) in syns_to_remove:
|
||||
# This uses .get() because I want it to blow up if
|
||||
# there isn't exactly 1 matching synonym.
|
||||
(Synonym.objects.get(from_words=from_words, to_words=to_words).delete())
|
||||
|
||||
for (from_words, to_words) in syns_to_add:
|
||||
Synonym(from_words=from_words, to_words=to_words).save()
|
||||
|
||||
return HttpResponseRedirect(request.path)
|
||||
|
||||
# If synonyms_text is not None, it came from POST, and there were
|
||||
# errors. It shouldn't be modified, so the error messages make sense.
|
||||
if synonyms_text is None:
|
||||
synonyms_text = "\n".join(str(s) for s in all_synonyms)
|
||||
|
||||
synonym_add_count, synonym_remove_count = synonym_utils.count_out_of_date()
|
||||
|
||||
return render(
|
||||
request,
|
||||
"admin/search_synonyms.html",
|
||||
{
|
||||
"synonyms_text": synonyms_text,
|
||||
"errors": parse_errors,
|
||||
"synonym_add_count": synonym_add_count,
|
||||
"synonym_remove_count": synonym_remove_count,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
admin.site.register_view(path="synonym_bulk", view=synonym_editor, name="Search - Synonym Editor")
|
|
@ -1,163 +0,0 @@
|
|||
from django.conf import settings
|
||||
|
||||
from elasticsearch import RequestsHttpConnection
|
||||
from rest_framework import serializers
|
||||
from rest_framework.decorators import api_view
|
||||
from rest_framework.response import Response
|
||||
|
||||
from kitsune.products.models import Product
|
||||
from kitsune.questions.models import Question, QuestionMappingType
|
||||
from kitsune.questions.api import QuestionSerializer
|
||||
from kitsune.search import es_utils
|
||||
from kitsune.sumo.api_utils import GenericAPIException
|
||||
from kitsune.wiki.api import DocumentDetailSerializer
|
||||
from kitsune.wiki.models import Document, DocumentMappingType
|
||||
|
||||
|
||||
def positive_integer(value):
|
||||
if value < 0:
|
||||
raise serializers.ValidationError("This field must be positive.")
|
||||
|
||||
|
||||
def valid_product(value):
|
||||
if not value:
|
||||
return
|
||||
|
||||
if not Product.objects.filter(slug=value).exists():
|
||||
raise serializers.ValidationError('Could not find product with slug "{0}".'.format(value))
|
||||
|
||||
|
||||
def valid_locale(value):
|
||||
if not value:
|
||||
return
|
||||
|
||||
if value not in settings.SUMO_LANGUAGES:
|
||||
if value in settings.NON_SUPPORTED_LOCALES:
|
||||
fallback = settings.NON_SUPPORTED_LOCALES[value] or settings.WIKI_DEFAULT_LANGUAGE
|
||||
raise serializers.ValidationError(
|
||||
'"{0}" is not supported, but has fallback locale "{1}".'.format(value, fallback)
|
||||
)
|
||||
else:
|
||||
raise serializers.ValidationError('Could not find locale "{0}".'.format(value))
|
||||
|
||||
|
||||
class SuggestSerializer(serializers.Serializer):
|
||||
q = serializers.CharField(required=True)
|
||||
locale = serializers.CharField(
|
||||
required=False, default=settings.WIKI_DEFAULT_LANGUAGE, validators=[valid_locale]
|
||||
)
|
||||
product = serializers.CharField(required=False, default="", validators=[valid_product])
|
||||
max_questions = serializers.IntegerField(
|
||||
required=False, default=10, validators=[positive_integer]
|
||||
)
|
||||
max_documents = serializers.IntegerField(
|
||||
required=False, default=10, validators=[positive_integer]
|
||||
)
|
||||
|
||||
|
||||
@api_view(["GET", "POST"])
|
||||
def suggest(request):
|
||||
if request.data and request.GET:
|
||||
raise GenericAPIException(
|
||||
400, "Put all parameters either in the querystring or the HTTP request body."
|
||||
)
|
||||
|
||||
serializer = SuggestSerializer(data=(request.data or request.GET))
|
||||
if not serializer.is_valid():
|
||||
raise GenericAPIException(400, serializer.errors)
|
||||
|
||||
searcher = (
|
||||
es_utils.AnalyzerS()
|
||||
.es(
|
||||
urls=settings.ES_URLS,
|
||||
timeout=settings.ES_TIMEOUT,
|
||||
use_ssl=settings.ES_USE_SSL,
|
||||
http_auth=settings.ES_HTTP_AUTH,
|
||||
connection_class=RequestsHttpConnection,
|
||||
)
|
||||
.indexes(es_utils.read_index("default"))
|
||||
)
|
||||
|
||||
data = serializer.validated_data
|
||||
|
||||
return Response(
|
||||
{
|
||||
"questions": _question_suggestions(
|
||||
searcher, data["q"], data["locale"], data["product"], data["max_questions"]
|
||||
),
|
||||
"documents": _document_suggestions(
|
||||
searcher, data["q"], data["locale"], data["product"], data["max_documents"]
|
||||
),
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
def _question_suggestions(searcher, text, locale, product, max_results):
|
||||
if max_results <= 0:
|
||||
return []
|
||||
|
||||
search_filter = es_utils.F(
|
||||
model="questions_question",
|
||||
question_is_archived=False,
|
||||
question_is_locked=False,
|
||||
question_is_solved=True,
|
||||
)
|
||||
if product:
|
||||
search_filter &= es_utils.F(product=product)
|
||||
if locale:
|
||||
search_filter &= es_utils.F(question_locale=locale)
|
||||
|
||||
questions = []
|
||||
searcher = _query(searcher, QuestionMappingType, search_filter, text, locale)
|
||||
|
||||
question_ids = [result["id"] for result in searcher[:max_results]]
|
||||
questions = [
|
||||
QuestionSerializer(instance=q).data for q in Question.objects.filter(id__in=question_ids)
|
||||
]
|
||||
|
||||
return questions
|
||||
|
||||
|
||||
def _document_suggestions(searcher, text, locale, product, max_results):
|
||||
if max_results <= 0:
|
||||
return []
|
||||
|
||||
search_filter = es_utils.F(
|
||||
model="wiki_document",
|
||||
document_category__in=settings.SEARCH_DEFAULT_CATEGORIES,
|
||||
document_locale=locale,
|
||||
document_is_archived=False,
|
||||
)
|
||||
|
||||
if product:
|
||||
search_filter &= es_utils.F(product=product)
|
||||
|
||||
documents = []
|
||||
searcher = _query(searcher, DocumentMappingType, search_filter, text, locale)
|
||||
|
||||
doc_ids = [result["id"] for result in searcher[:max_results]]
|
||||
|
||||
documents = [
|
||||
DocumentDetailSerializer(instance=doc).data
|
||||
for doc in Document.objects.filter(id__in=doc_ids)
|
||||
]
|
||||
|
||||
return documents
|
||||
|
||||
|
||||
def _query(searcher, mapping_type, search_filter, query_text, locale):
|
||||
query_fields = mapping_type.get_query_fields()
|
||||
query = {}
|
||||
for field in query_fields:
|
||||
for query_type in ["match", "match_phrase"]:
|
||||
key = "{0}__{1}".format(field, query_type)
|
||||
query[key] = query_text
|
||||
|
||||
# Transform query to be locale aware.
|
||||
query = es_utils.es_query_with_analyzer(query, locale)
|
||||
|
||||
return (
|
||||
searcher.doctypes(mapping_type.get_mapping_type_name())
|
||||
.filter(search_filter)
|
||||
.query(should=True, **query)
|
||||
)
|
|
@ -372,6 +372,5 @@ ES_LOCALE_ANALYZERS = {
|
|||
}
|
||||
|
||||
DEFAULT_ES7_CONNECTION = "es7_default"
|
||||
|
||||
# default refresh_interval for all indices
|
||||
DEFAULT_ES7_REFRESH_INTERVAL = "60s"
|
||||
|
|
|
@ -1,891 +0,0 @@
|
|||
import json
|
||||
import logging
|
||||
import pprint
|
||||
import time
|
||||
from functools import wraps
|
||||
|
||||
import requests
|
||||
from django.conf import settings
|
||||
from django.db import reset_queries
|
||||
from django.http import HttpResponse
|
||||
from django.shortcuts import render
|
||||
from django.utils.translation import ugettext as _
|
||||
from elasticutils import S as UntypedS
|
||||
from elasticutils.contrib.django import ES_EXCEPTIONS, F, S, get_es # noqa
|
||||
|
||||
from kitsune.search import config
|
||||
from kitsune.search.utils import chunked
|
||||
|
||||
# These used to be constants, but that was problematic. Things like
|
||||
# tests want to be able to dynamically change settings at run time,
|
||||
# which isn't possible if these are constants.
|
||||
|
||||
|
||||
def read_index(group):
|
||||
"""Gets the name of the read index for a group."""
|
||||
return "%s_%s" % (settings.ES_INDEX_PREFIX, settings.ES_INDEXES[group])
|
||||
|
||||
|
||||
def write_index(group):
|
||||
"""Gets the name of the write index for a group."""
|
||||
return "%s_%s" % (settings.ES_INDEX_PREFIX, settings.ES_WRITE_INDEXES[group])
|
||||
|
||||
|
||||
def all_read_indexes():
|
||||
return [read_index(group) for group in list(settings.ES_INDEXES.keys())]
|
||||
|
||||
|
||||
def all_write_indexes():
|
||||
return [write_index(group) for group in list(settings.ES_WRITE_INDEXES.keys())]
|
||||
|
||||
|
||||
# The number of things in a chunk. This is for parallel indexing via
|
||||
# the admin.
|
||||
CHUNK_SIZE = 20000
|
||||
|
||||
|
||||
log = logging.getLogger("k.search.es")
|
||||
|
||||
|
||||
class MappingMergeError(Exception):
|
||||
"""Represents a mapping merge error"""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
class UnindexMeBro(Exception):
|
||||
"""Raise in extract_document when doc should be removed."""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
class AnalyzerMixin(object):
|
||||
def _with_analyzer(self, key, val, action):
|
||||
"""Do a normal kind of query, with a analyzer added.
|
||||
|
||||
:arg key: is the field being searched
|
||||
:arg val: Is a two-tupe of the text to query for and the name of
|
||||
the analyzer to use.
|
||||
:arg action: is the type of query being performed, like match or
|
||||
match_phrase
|
||||
"""
|
||||
query, analyzer = val
|
||||
clause = {
|
||||
action: {
|
||||
key: {
|
||||
"query": query,
|
||||
"analyzer": analyzer,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
boost = self.field_boosts.get(key)
|
||||
if boost is not None:
|
||||
clause[action][key]["boost"] = boost
|
||||
|
||||
return clause
|
||||
|
||||
def process_query_match_phrase_analyzer(self, key, val, action):
|
||||
"""A match phrase query that includes an analyzer."""
|
||||
return self._with_analyzer(key, val, "match_phrase")
|
||||
|
||||
def process_query_match_analyzer(self, key, val, action):
|
||||
"""A match query that includes an analyzer."""
|
||||
return self._with_analyzer(key, val, "match")
|
||||
|
||||
def process_query_sqs(self, key, val, action):
|
||||
"""Implements simple_query_string query"""
|
||||
return {
|
||||
"simple_query_string": {
|
||||
"fields": [key],
|
||||
"query": val,
|
||||
"default_operator": "or",
|
||||
}
|
||||
}
|
||||
|
||||
def process_query_sqs_analyzer(self, key, val, action):
|
||||
"""Implements sqs query that includes an analyzer"""
|
||||
query, analyzer = val
|
||||
return {
|
||||
"simple_query_string": {
|
||||
"fields": [key],
|
||||
"query": query,
|
||||
"analyzer": analyzer,
|
||||
"default_operator": "or",
|
||||
}
|
||||
}
|
||||
|
||||
def process_query_match_whitespace(self, key, val, action):
|
||||
"""A match query that uses the whitespace analyzer."""
|
||||
return {
|
||||
"match": {
|
||||
key: {
|
||||
"query": val,
|
||||
"analyzer": "whitespace",
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
class Sphilastic(S, AnalyzerMixin):
|
||||
"""Shim around elasticutils.contrib.django.S.
|
||||
|
||||
Implements some Kitsune-specific behavior to make our lives
|
||||
easier.
|
||||
|
||||
.. Note::
|
||||
|
||||
This looks at the read index. If you need to look at something
|
||||
different, build your own S.
|
||||
|
||||
"""
|
||||
|
||||
def print_query(self):
|
||||
pprint.pprint(self._build_query())
|
||||
|
||||
def get_indexes(self):
|
||||
# SphilasticUnified is a searcher and so it's _always_ used in
|
||||
# a read context. Therefore, we always return the read index.
|
||||
return [read_index(self.type.get_index_group())]
|
||||
|
||||
def process_query_mlt(self, key, val, action):
|
||||
"""Add support for a more like this query to our S.
|
||||
|
||||
val is expected to be a dict like:
|
||||
{
|
||||
'fields': ['field1', 'field2'],
|
||||
'like_text': 'text like this one',
|
||||
}
|
||||
"""
|
||||
return {
|
||||
"more_like_this": val,
|
||||
}
|
||||
|
||||
|
||||
class AnalyzerS(UntypedS, AnalyzerMixin):
|
||||
"""This is to give the search view support for setting the analyzer.
|
||||
|
||||
This differs from Sphilastic in that this is a plain ES S object,
|
||||
not based on Django.
|
||||
|
||||
This just exists as a way to mix together UntypedS and AnalyzerMixin.
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
def get_mappings(index):
|
||||
mappings = {}
|
||||
|
||||
from kitsune.search.models import get_mapping_types
|
||||
|
||||
for cls in get_mapping_types():
|
||||
group = cls.get_index_group()
|
||||
if index == write_index(group) or index == read_index(group):
|
||||
mappings[cls.get_mapping_type_name()] = cls.get_mapping()
|
||||
|
||||
return mappings
|
||||
|
||||
|
||||
def get_all_mappings():
|
||||
mappings = {}
|
||||
|
||||
from kitsune.search.models import get_mapping_types
|
||||
|
||||
for cls in get_mapping_types():
|
||||
mappings[cls.get_mapping_type_name()] = cls.get_mapping()
|
||||
|
||||
return mappings
|
||||
|
||||
|
||||
def get_indexes(all_indexes=False):
|
||||
"""Query ES to get a list of indexes that actually exist.
|
||||
|
||||
:returns: A dict like {index_name: document_count}.
|
||||
"""
|
||||
es = get_es()
|
||||
status = es.indices.status()
|
||||
indexes = status["indices"]
|
||||
|
||||
if not all_indexes:
|
||||
indexes = dict(
|
||||
(k, v) for k, v in list(indexes.items()) if k.startswith(settings.ES_INDEX_PREFIX)
|
||||
)
|
||||
|
||||
return [(name, value["docs"]["num_docs"]) for name, value in list(indexes.items())]
|
||||
|
||||
|
||||
def get_doctype_stats(index):
|
||||
"""Returns a dict of name -> count for documents indexed.
|
||||
|
||||
For example:
|
||||
|
||||
>>> get_doctype_stats()
|
||||
{'questions_question': 14216, 'forums_thread': 419, 'wiki_document': 759}
|
||||
|
||||
:throws elasticsearch.exceptions.ConnectionError: if there is a
|
||||
connection error, including a timeout.
|
||||
:throws elasticsearch.exceptions.NotFound: if the index doesn't exist
|
||||
|
||||
"""
|
||||
stats = {}
|
||||
|
||||
from kitsune.search.models import get_mapping_types
|
||||
|
||||
for cls in get_mapping_types():
|
||||
if cls.get_index() == index:
|
||||
# Note: Can't use cls.search() here since that returns a
|
||||
# Sphilastic which is hard-coded to look only at the
|
||||
# read index..
|
||||
s = S(cls).indexes(index)
|
||||
stats[cls.get_mapping_type_name()] = s.count()
|
||||
|
||||
return stats
|
||||
|
||||
|
||||
def delete_index(index):
|
||||
get_es().indices.delete(index=index, ignore=[404])
|
||||
|
||||
|
||||
def format_time(time_to_go):
|
||||
"""Returns minutes and seconds string for given time in seconds"""
|
||||
if time_to_go < 60:
|
||||
return "%ds" % time_to_go
|
||||
return "%dm %ds" % (time_to_go / 60, time_to_go % 60)
|
||||
|
||||
|
||||
def get_documents(cls, ids):
|
||||
"""Returns a list of ES documents with specified ids and doctype
|
||||
|
||||
:arg cls: the mapping type class with a ``.search()`` to use
|
||||
:arg ids: the list of ids to retrieve documents for
|
||||
|
||||
:returns: list of documents as dicts
|
||||
"""
|
||||
# FIXME: We pull the field names from the mapping, but I'm not
|
||||
# sure if this works in all cases or not and it's kind of hacky.
|
||||
fields = list(cls.get_mapping()["properties"].keys())
|
||||
ret = cls.search().filter(id__in=ids).values_dict(*fields)[: len(ids)]
|
||||
return cls.reshape(ret)
|
||||
|
||||
|
||||
def get_analysis():
|
||||
"""Generate all our custom analyzers, tokenizers, and filters
|
||||
|
||||
These are variants of the Snowball analyzer for various languages,
|
||||
but could also include custom analyzers if the need arises.
|
||||
"""
|
||||
analyzers = {}
|
||||
filters = {}
|
||||
|
||||
# The keys are locales to look up to decide the analyzer's name.
|
||||
# The values are the language name to set for Snowball.
|
||||
snowball_langs = {
|
||||
"eu": "Basque",
|
||||
"ca": "Catalan",
|
||||
"da": "Danish",
|
||||
"nl": "Dutch",
|
||||
"en-US": "English",
|
||||
"fi": "Finnish",
|
||||
"fr": "French",
|
||||
"de": "German",
|
||||
"hu": "Hungarian",
|
||||
"it": "Italian",
|
||||
"no": "Norwegian",
|
||||
"pt-BR": "Portuguese",
|
||||
"ro": "Romanian",
|
||||
"ru": "Russian",
|
||||
"es": "Spanish",
|
||||
"sv": "Swedish",
|
||||
"tr": "Turkish",
|
||||
}
|
||||
|
||||
for locale, language in list(snowball_langs.items()):
|
||||
analyzer_name = es_analyzer_for_locale(locale, synonyms=False)
|
||||
analyzers[analyzer_name] = {
|
||||
"type": "snowball",
|
||||
"language": language,
|
||||
}
|
||||
|
||||
# The snowball analyzer is actually just a shortcut that does
|
||||
# a particular set of tokenizers and analyzers. According to
|
||||
# the docs, the below is the same as that, plus synonym handling.
|
||||
|
||||
if locale in config.ES_SYNONYM_LOCALES:
|
||||
analyzer_name = es_analyzer_for_locale(locale, synonyms=True)
|
||||
analyzers[analyzer_name] = {
|
||||
"type": "custom",
|
||||
"tokenizer": "standard",
|
||||
"filter": [
|
||||
"standard",
|
||||
"lowercase",
|
||||
"synonyms-" + locale,
|
||||
"stop",
|
||||
"snowball-" + locale,
|
||||
],
|
||||
}
|
||||
|
||||
for locale in config.ES_SYNONYM_LOCALES:
|
||||
filter_name, filter_body = es_get_synonym_filter(locale)
|
||||
filters[filter_name] = filter_body
|
||||
filters["snowball-" + locale] = {
|
||||
"type": "snowball",
|
||||
"language": snowball_langs[locale],
|
||||
}
|
||||
|
||||
# Done!
|
||||
return {
|
||||
"analyzer": analyzers,
|
||||
"filter": filters,
|
||||
}
|
||||
|
||||
|
||||
def es_get_synonym_filter(locale):
|
||||
# Avoid circular import
|
||||
from kitsune.search.models import Synonym
|
||||
|
||||
# The synonym filter doesn't like it if the synonyms list is empty.
|
||||
# If there are no synyonms, just make a no-op filter by making a
|
||||
# synonym from one word to itself.
|
||||
# TODO: Someday this should be something like .filter(locale=locale)
|
||||
synonyms = list(Synonym.objects.all()) or ["firefox => firefox"]
|
||||
name = "synonyms-" + locale
|
||||
body = {
|
||||
"type": "synonym",
|
||||
"synonyms": [str(s) for s in synonyms],
|
||||
}
|
||||
|
||||
return name, body
|
||||
|
||||
|
||||
def recreate_indexes(es=None, indexes=None):
|
||||
"""Deletes indexes and recreates them.
|
||||
|
||||
:arg es: An ES object to use. Defaults to calling `get_es()`.
|
||||
:arg indexes: A list of indexes to recreate. Defaults to all write
|
||||
indexes.
|
||||
"""
|
||||
if es is None:
|
||||
es = get_es()
|
||||
if indexes is None:
|
||||
indexes = all_write_indexes()
|
||||
|
||||
for index in indexes:
|
||||
delete_index(index)
|
||||
|
||||
# There should be no mapping-conflict race here since the index doesn't
|
||||
# exist. Live indexing should just fail.
|
||||
|
||||
# Simultaneously create the index, the mappings, the analyzers, and
|
||||
# the tokenizers, so live indexing doesn't get a chance to index
|
||||
# anything between and infer a bogus mapping (which ES then freaks
|
||||
# out over when we try to lay in an incompatible explicit mapping).
|
||||
es.indices.create(
|
||||
index=index,
|
||||
body={
|
||||
"mappings": get_mappings(index),
|
||||
"settings": {
|
||||
"analysis": get_analysis(),
|
||||
},
|
||||
},
|
||||
)
|
||||
|
||||
# Wait until the index is there.
|
||||
es.cluster.health(wait_for_status="yellow")
|
||||
|
||||
|
||||
def get_index_settings(index):
|
||||
"""Returns ES settings for this index"""
|
||||
return get_es().indices.get_settings(index=index).get(index, {}).get("settings", {})
|
||||
|
||||
|
||||
def get_indexable(percent=100, seconds_ago=0, mapping_types=None):
|
||||
"""Returns a list of (class, iterable) for all the things to index
|
||||
|
||||
:arg percent: Defaults to 100. Allows you to specify how much of
|
||||
each doctype you want to index. This is useful for
|
||||
development where doing a full reindex takes an hour.
|
||||
:arg mapping_types: The list of mapping types to index.
|
||||
|
||||
"""
|
||||
from kitsune.search.models import get_mapping_types
|
||||
|
||||
# Note: Passing in None will get all the mapping types
|
||||
mapping_types = get_mapping_types(mapping_types)
|
||||
|
||||
to_index = []
|
||||
percent = float(percent) / 100
|
||||
for cls in mapping_types:
|
||||
indexable = cls.get_indexable(seconds_ago=seconds_ago)
|
||||
if percent < 1:
|
||||
indexable = indexable[: int(indexable.count() * percent)]
|
||||
to_index.append((cls, indexable))
|
||||
|
||||
return to_index
|
||||
|
||||
|
||||
def index_chunk(cls, id_list, reraise=False):
|
||||
"""Index a chunk of documents.
|
||||
|
||||
:arg cls: The MappingType class.
|
||||
:arg id_list: Iterable of ids of that MappingType to index.
|
||||
:arg reraise: False if you want errors to be swallowed and True
|
||||
if you want errors to be thrown.
|
||||
|
||||
"""
|
||||
# Note: This bulk indexes in batches of 80. I didn't arrive at
|
||||
# this number through a proper scientific method. It's possible
|
||||
# there's a better number. It takes a while to fiddle with,
|
||||
# though. Probably best to expose the number as an environment
|
||||
# variable, then run a script that takes timings for
|
||||
# --criticalmass, runs overnight and returns a more "optimal"
|
||||
# number.
|
||||
for ids in chunked(id_list, 80):
|
||||
documents = []
|
||||
for id_ in ids:
|
||||
try:
|
||||
documents.append(cls.extract_document(id_))
|
||||
|
||||
except UnindexMeBro:
|
||||
# extract_document throws this in cases where we need
|
||||
# to remove the item from the index.
|
||||
cls.unindex(id_)
|
||||
|
||||
except Exception:
|
||||
log.exception("Unable to extract/index document (id: %d)", id_)
|
||||
if reraise:
|
||||
raise
|
||||
|
||||
if documents:
|
||||
cls.bulk_index(documents, id_field="id")
|
||||
|
||||
if settings.DEBUG:
|
||||
# Nix queries so that this doesn't become a complete
|
||||
# memory hog and make Will's computer sad when DEBUG=True.
|
||||
reset_queries()
|
||||
|
||||
|
||||
def es_reindex_cmd(
|
||||
percent=100, delete=False, mapping_types=None, criticalmass=False, seconds_ago=0, log=log
|
||||
):
|
||||
"""Rebuild ElasticSearch indexes
|
||||
|
||||
:arg percent: 1 to 100--the percentage of the db to index
|
||||
:arg delete: whether or not to wipe the index before reindexing
|
||||
:arg mapping_types: list of mapping types to index
|
||||
:arg criticalmass: whether or not to index just a critical mass of
|
||||
things
|
||||
:arg seconds_ago: things updated less than this number of seconds
|
||||
ago should be reindexed
|
||||
:arg log: the logger to use
|
||||
|
||||
"""
|
||||
es = get_es()
|
||||
|
||||
if mapping_types is None:
|
||||
indexes = all_write_indexes()
|
||||
else:
|
||||
indexes = indexes_for_doctypes(mapping_types)
|
||||
|
||||
need_delete = False
|
||||
for index in indexes:
|
||||
try:
|
||||
# This is used to see if the index exists.
|
||||
get_doctype_stats(index)
|
||||
except ES_EXCEPTIONS:
|
||||
if not delete:
|
||||
log.error('The index "%s" does not exist. ' "You must specify --delete." % index)
|
||||
need_delete = True
|
||||
if need_delete:
|
||||
return
|
||||
|
||||
if delete:
|
||||
log.info("wiping and recreating %s...", ", ".join(indexes))
|
||||
recreate_indexes(es, indexes)
|
||||
|
||||
if criticalmass:
|
||||
# The critical mass is defined as the entire KB plus the most
|
||||
# recent 15k questions (which is about how many questions
|
||||
# there were created in the last 180 days). We build that
|
||||
# indexable here.
|
||||
|
||||
# Get only questions and wiki document stuff.
|
||||
all_indexable = get_indexable(mapping_types=["questions_question", "wiki_document"])
|
||||
|
||||
# The first item is questions because we specified that
|
||||
# order. Old questions don't show up in searches, so we nix
|
||||
# them by reversing the list (ordered by id ascending) and
|
||||
# slicing it.
|
||||
all_indexable[0] = (all_indexable[0][0], list(reversed(all_indexable[0][1]))[:15000])
|
||||
|
||||
elif mapping_types:
|
||||
all_indexable = get_indexable(percent, seconds_ago, mapping_types)
|
||||
|
||||
else:
|
||||
all_indexable = get_indexable(percent, seconds_ago)
|
||||
|
||||
try:
|
||||
old_refreshes = {}
|
||||
# We're doing a lot of indexing, so we get the refresh_interval of
|
||||
# the index currently, then nix refreshing. Later we'll restore it.
|
||||
for index in indexes:
|
||||
old_refreshes[index] = get_index_settings(index).get("index.refresh_interval", "1s")
|
||||
# Disable automatic refreshing
|
||||
es.indices.put_settings(index=index, body={"index": {"refresh_interval": "-1"}})
|
||||
|
||||
start_time = time.time()
|
||||
for cls, indexable in all_indexable:
|
||||
cls_start_time = time.time()
|
||||
total = len(indexable)
|
||||
|
||||
if total == 0:
|
||||
continue
|
||||
|
||||
chunk_start_time = time.time()
|
||||
log.info("reindexing %s. %s to index....", cls.get_mapping_type_name(), total)
|
||||
|
||||
i = 0
|
||||
for chunk in chunked(indexable, 1000):
|
||||
chunk_start_time = time.time()
|
||||
index_chunk(cls, chunk)
|
||||
|
||||
i += len(chunk)
|
||||
time_to_go = (total - i) * ((time.time() - cls_start_time) / i)
|
||||
per_1000 = (time.time() - cls_start_time) / (i / 1000.0)
|
||||
this_1000 = time.time() - chunk_start_time
|
||||
|
||||
log.info(
|
||||
" %s/%s %s... (%s/1000 avg, %s ETA)",
|
||||
i,
|
||||
total,
|
||||
format_time(this_1000),
|
||||
format_time(per_1000),
|
||||
format_time(time_to_go),
|
||||
)
|
||||
|
||||
delta_time = time.time() - cls_start_time
|
||||
log.info(
|
||||
" done! (%s total, %s/1000 avg)",
|
||||
format_time(delta_time),
|
||||
format_time(delta_time / (total / 1000.0)),
|
||||
)
|
||||
|
||||
delta_time = time.time() - start_time
|
||||
log.info("done! (%s total)", format_time(delta_time))
|
||||
|
||||
finally:
|
||||
# Re-enable automatic refreshing
|
||||
for index, old_refresh in list(old_refreshes.items()):
|
||||
es.indices.put_settings(index=index, body={"index": {"refresh_interval": old_refresh}})
|
||||
|
||||
|
||||
def es_delete_cmd(index, noinput=False, log=log):
|
||||
"""Deletes an index"""
|
||||
try:
|
||||
indexes = [name for name, count in get_indexes()]
|
||||
except ES_EXCEPTIONS:
|
||||
log.error(
|
||||
"Your elasticsearch process is not running or ES_URLS "
|
||||
"is set wrong in your settings_local.py file."
|
||||
)
|
||||
return
|
||||
|
||||
if index not in indexes:
|
||||
log.error('Index "%s" is not a valid index.', index)
|
||||
return
|
||||
|
||||
if index in all_read_indexes() and not noinput:
|
||||
ret = input('"%s" is a read index. Are you sure you want to delete it? (yes/no) ' % index)
|
||||
if ret != "yes":
|
||||
log.info("Not deleting the index.")
|
||||
return
|
||||
|
||||
log.info('Deleting index "%s"...', index)
|
||||
delete_index(index)
|
||||
log.info("Done!")
|
||||
|
||||
|
||||
def es_status_cmd(checkindex=False, log=log):
|
||||
"""Shows elastic search index status"""
|
||||
try:
|
||||
# TODO: SUMO has a single ES_URL and that's the ZLB and does
|
||||
# the balancing. If that ever changes and we have multiple
|
||||
# ES_URLs, then this should get fixed.
|
||||
es_deets = requests.get(settings.ES_URLS[0]).json()
|
||||
except requests.exceptions.RequestException:
|
||||
pass
|
||||
|
||||
read_doctype_stats = {}
|
||||
for index in all_read_indexes():
|
||||
try:
|
||||
read_doctype_stats[index] = get_doctype_stats(index)
|
||||
except ES_EXCEPTIONS:
|
||||
read_doctype_stats[index] = None
|
||||
|
||||
if set(all_read_indexes()) == set(all_write_indexes()):
|
||||
write_doctype_stats = read_doctype_stats
|
||||
else:
|
||||
write_doctype_stats = {}
|
||||
for index in all_write_indexes():
|
||||
try:
|
||||
write_doctype_stats[index] = get_doctype_stats(index)
|
||||
except ES_EXCEPTIONS:
|
||||
write_doctype_stats[index] = None
|
||||
|
||||
try:
|
||||
indexes = get_indexes(all_indexes=True)
|
||||
except ES_EXCEPTIONS:
|
||||
log.error(
|
||||
"Your elasticsearch process is not running or ES_URLS "
|
||||
"is set wrong in your settings_local.py file."
|
||||
)
|
||||
return
|
||||
|
||||
log.info("Elasticsearch:")
|
||||
log.info(" Version : %s", es_deets["version"]["number"])
|
||||
|
||||
log.info("Settings:")
|
||||
log.info(" ES_URLS : %s", settings.ES_URLS)
|
||||
log.info(" ES_INDEX_PREFIX : %s", settings.ES_INDEX_PREFIX)
|
||||
log.info(" ES_LIVE_INDEXING : %s", settings.ES_LIVE_INDEXING)
|
||||
log.info(" ES_INDEXES : %s", settings.ES_INDEXES)
|
||||
log.info(" ES_WRITE_INDEXES : %s", settings.ES_WRITE_INDEXES)
|
||||
|
||||
log.info("Index stats:")
|
||||
|
||||
if indexes:
|
||||
log.info(" List of indexes:")
|
||||
for name, count in sorted(indexes):
|
||||
read_write = []
|
||||
if name in all_read_indexes():
|
||||
read_write.append("READ")
|
||||
if name in all_write_indexes():
|
||||
read_write.append("WRITE")
|
||||
log.info(" %-22s: %s %s", name, count, "/".join(read_write))
|
||||
else:
|
||||
log.info(" There are no %s indexes.", settings.ES_INDEX_PREFIX)
|
||||
|
||||
if not read_doctype_stats:
|
||||
read_index_names = ", ".join(all_read_indexes())
|
||||
log.info(" No read indexes exist. (%s)", read_index_names)
|
||||
else:
|
||||
log.info(" Read indexes:")
|
||||
for index, stats in list(read_doctype_stats.items()):
|
||||
if stats is None:
|
||||
log.info(" %s does not exist", index)
|
||||
else:
|
||||
log.info(" %s:", index)
|
||||
for name, count in sorted(stats.items()):
|
||||
log.info(" %-22s: %d", name, count)
|
||||
|
||||
if set(all_read_indexes()) == set(all_write_indexes()):
|
||||
log.info(" Write indexes are the same as the read indexes.")
|
||||
else:
|
||||
if not write_doctype_stats:
|
||||
write_index_names = ", ".join(all_write_indexes())
|
||||
log.info(" No write indexes exist. (%s)", write_index_names)
|
||||
else:
|
||||
log.info(" Write indexes:")
|
||||
for index, stats in list(write_doctype_stats.items()):
|
||||
if stats is None:
|
||||
log.info(" %s does not exist", index)
|
||||
else:
|
||||
log.info(" %s:", index)
|
||||
for name, count in sorted(stats.items()):
|
||||
log.info(" %-22s: %d", name, count)
|
||||
|
||||
if checkindex:
|
||||
# Go through the index and verify everything
|
||||
log.info("Checking index contents....")
|
||||
|
||||
missing_docs = 0
|
||||
|
||||
for cls, id_list in get_indexable():
|
||||
for id_group in chunked(id_list, 100):
|
||||
doc_list = get_documents(cls, id_group)
|
||||
if len(id_group) != len(doc_list):
|
||||
doc_list_ids = [doc["id"] for doc in doc_list]
|
||||
for id_ in id_group:
|
||||
if id_ not in doc_list_ids:
|
||||
log.info(" Missing %s %s", cls.get_model_name(), id_)
|
||||
missing_docs += 1
|
||||
|
||||
if missing_docs:
|
||||
print("There were %d missing_docs" % missing_docs)
|
||||
|
||||
|
||||
def es_search_cmd(query, pages=1, log=log):
|
||||
"""Simulates a front page search"""
|
||||
from kitsune.sumo.tests import LocalizingClient
|
||||
from kitsune.sumo.urlresolvers import reverse
|
||||
|
||||
client = LocalizingClient()
|
||||
|
||||
output = []
|
||||
output.append("Search for: %s" % query)
|
||||
output.append("")
|
||||
|
||||
data = {"q": query, "format": "json"}
|
||||
url = reverse("search")
|
||||
|
||||
# The search view shows 10 results at a time. So we hit it few
|
||||
# times---once for each page.
|
||||
for pageno in range(pages):
|
||||
pageno = pageno + 1
|
||||
data["page"] = pageno
|
||||
resp = client.get(url, data)
|
||||
if resp.status_code != 200:
|
||||
output.append("ERROR: %s" % resp.content)
|
||||
break
|
||||
|
||||
else:
|
||||
content = json.loads(resp.content)
|
||||
results = content["results"]
|
||||
|
||||
for mem in results:
|
||||
output.append(
|
||||
"%4d %5.2f %-10s %-20s"
|
||||
% (mem["rank"], mem["score"], mem["type"], mem["title"])
|
||||
)
|
||||
|
||||
output.append("")
|
||||
|
||||
for line in output:
|
||||
log.info(line.encode("ascii", "ignore"))
|
||||
|
||||
|
||||
def es_verify_cmd(log=log):
|
||||
log.info("Behold! I am the magificent esverify command and I shall verify")
|
||||
log.info("all things verifyable so that you can rest assured that your")
|
||||
log.info("changes are bereft of the tawdry clutches of whimsy and")
|
||||
log.info("misfortune.")
|
||||
log.info("")
|
||||
|
||||
log.info("Verifying mappings do not conflict.")
|
||||
|
||||
# Verify mappings that share the same index don't conflict
|
||||
for index in all_write_indexes():
|
||||
merged_mapping = {}
|
||||
|
||||
log.info("Verifying mappings for index: {index}".format(index=index))
|
||||
|
||||
start_time = time.time()
|
||||
for cls_name, mapping in list(get_mappings(index).items()):
|
||||
mapping = mapping["properties"]
|
||||
for key, val in list(mapping.items()):
|
||||
if key not in merged_mapping:
|
||||
merged_mapping[key] = (val, [cls_name])
|
||||
continue
|
||||
|
||||
# FIXME - We're comparing two dicts here. This might not
|
||||
# work for non-trivial dicts.
|
||||
if merged_mapping[key][0] != val:
|
||||
raise MappingMergeError(
|
||||
"%s key different for %s and %s" % (key, cls_name, merged_mapping[key][1])
|
||||
)
|
||||
|
||||
merged_mapping[key][1].append(cls_name)
|
||||
|
||||
log.info("Done! {0}".format(format_time(time.time() - start_time)))
|
||||
log.info("")
|
||||
|
||||
|
||||
def es_analyzer_for_locale(locale, synonyms=False, fallback="standard"):
|
||||
"""Pick an appropriate analyzer for a given locale.
|
||||
|
||||
If no analyzer is defined for `locale`, return fallback instead,
|
||||
which defaults to ES analyzer named "standard".
|
||||
|
||||
If `synonyms` is True, this will return a synonym-using analyzer,
|
||||
if that makes sense. In particular, it doesn't make sense to use
|
||||
synonyms with the fallback analyzer.
|
||||
"""
|
||||
|
||||
if locale in settings.ES_LOCALE_ANALYZERS:
|
||||
analyzer = settings.ES_LOCALE_ANALYZERS[locale]
|
||||
if synonyms and locale in config.ES_SYNONYM_LOCALES:
|
||||
analyzer += "-synonyms"
|
||||
else:
|
||||
analyzer = fallback
|
||||
|
||||
if not settings.ES_USE_PLUGINS and analyzer in settings.ES_PLUGIN_ANALYZERS:
|
||||
analyzer = fallback
|
||||
|
||||
return analyzer
|
||||
|
||||
|
||||
def es_query_with_analyzer(query, locale):
|
||||
"""Transform a query dict to use _analyzer actions for the right fields."""
|
||||
analyzer = es_analyzer_for_locale(locale, synonyms=True)
|
||||
new_query = {}
|
||||
|
||||
# Import locally to avoid circular import
|
||||
from kitsune.search.models import get_mapping_types
|
||||
|
||||
localized_fields = []
|
||||
for mt in get_mapping_types():
|
||||
localized_fields.extend(mt.get_localized_fields())
|
||||
|
||||
for k, v in list(query.items()):
|
||||
field, action = k.split("__")
|
||||
if field in localized_fields:
|
||||
new_query[k + "_analyzer"] = (v, analyzer)
|
||||
else:
|
||||
new_query[k] = v
|
||||
|
||||
return new_query
|
||||
|
||||
|
||||
def indexes_for_doctypes(doctype):
|
||||
# Import locally to avoid circular import.
|
||||
from kitsune.search.models import get_mapping_types
|
||||
|
||||
return set(d.get_index() for d in get_mapping_types(doctype))
|
||||
|
||||
|
||||
def handle_es_errors(template, status_code=503):
|
||||
"""Handles Elasticsearch exceptions for views
|
||||
|
||||
Wrap the entire view in this and don't worry about Elasticsearch exceptions
|
||||
again!
|
||||
|
||||
:arg template: template path string or function to generate the template
|
||||
path string for HTML requests
|
||||
:arg status_code: status code to return
|
||||
|
||||
:returns: content-type-appropriate HttpResponse
|
||||
|
||||
"""
|
||||
|
||||
def handler(fun):
|
||||
@wraps(fun)
|
||||
def _handler(request, *args, **kwargs):
|
||||
try:
|
||||
return fun(request, *args, **kwargs)
|
||||
|
||||
except ES_EXCEPTIONS as exc:
|
||||
is_json = request.GET.get("format") == "json"
|
||||
callback = request.GET.get("callback", "").strip()
|
||||
content_type = "application/x-javascript" if callback else "application/json"
|
||||
if is_json:
|
||||
return HttpResponse(
|
||||
json.dumps({"error": _("Search Unavailable")}),
|
||||
content_type=content_type,
|
||||
status=status_code,
|
||||
)
|
||||
|
||||
# If template is a function, call it with the request, args
|
||||
# and kwargs to get the template.
|
||||
if callable(template):
|
||||
actual_template = template(request, *args, **kwargs)
|
||||
else:
|
||||
actual_template = template
|
||||
|
||||
# Log exceptions so this isn't failing silently
|
||||
log.exception(exc)
|
||||
|
||||
return render(request, actual_template, status=503)
|
||||
|
||||
return _handler
|
||||
|
||||
return handler
|
|
@ -13,7 +13,5 @@
|
|||
<Url type="application/opensearchdescription+xml"
|
||||
rel="self"
|
||||
template="{{ host }}{{ url('search.plugin', locale=locale) }}"/>
|
||||
<Url type="application/x-suggestions+json"
|
||||
template="{{ host }}{{ url('search.suggestions', locale=locale) }}?q={searchTerms}"/>
|
||||
<moz:SearchForm>{{ host }}{{ url('search', locale=locale) }}</moz:SearchForm>
|
||||
</OpenSearchDescription>
|
||||
|
|
|
@ -1,21 +0,0 @@
|
|||
from django.core.management.base import LabelCommand
|
||||
|
||||
from kitsune.search.es_utils import es_delete_cmd
|
||||
from kitsune.search.utils import FakeLogger
|
||||
|
||||
|
||||
class Command(LabelCommand):
|
||||
label = "index"
|
||||
help = "Delete an index from elastic search."
|
||||
|
||||
def add_arguments(self, parser):
|
||||
super().add_arguments(parser)
|
||||
parser.add_argument(
|
||||
"--noinput",
|
||||
action="store_true",
|
||||
dest="noinput",
|
||||
help="Do not ask for input--just do it",
|
||||
)
|
||||
|
||||
def handle_label(self, label, **options):
|
||||
es_delete_cmd(label, noinput=options["noinput"], log=FakeLogger(self.stdout))
|
|
@ -1,87 +0,0 @@
|
|||
from django.core.management.base import BaseCommand, CommandError
|
||||
from django.test.utils import override_settings
|
||||
|
||||
from kitsune.search.es_utils import es_reindex_cmd
|
||||
from kitsune.search.utils import FakeLogger
|
||||
|
||||
|
||||
class Command(BaseCommand):
|
||||
help = "Reindex the database for Elastic."
|
||||
|
||||
def add_arguments(self, parser):
|
||||
parser.add_argument(
|
||||
"--percent",
|
||||
type=int,
|
||||
dest="percent",
|
||||
default=100,
|
||||
help="Reindex a percentage of things",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--delete", action="store_true", dest="delete", help="Wipes index before reindexing"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--hours-ago",
|
||||
type=int,
|
||||
dest="hours_ago",
|
||||
default=0,
|
||||
help="Reindex things updated N hours ago",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--minutes-ago",
|
||||
type=int,
|
||||
dest="minutes_ago",
|
||||
default=0,
|
||||
help="Reindex things updated N minutes ago",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--seconds-ago",
|
||||
type=int,
|
||||
dest="seconds_ago",
|
||||
default=0,
|
||||
help="Reindex things updated N seconds ago",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--mapping_types",
|
||||
dest="mapping_types",
|
||||
default=None,
|
||||
help="Comma-separated list of mapping types to index",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--criticalmass",
|
||||
action="store_true",
|
||||
dest="criticalmass",
|
||||
help="Indexes a critical mass of things",
|
||||
)
|
||||
|
||||
# We (ab)use override_settings to force ES_LIVE_INDEXING for the
|
||||
# duration of this command so that it actually indexes stuff.
|
||||
@override_settings(ES_LIVE_INDEXING=True)
|
||||
def handle(self, *args, **options):
|
||||
percent = options["percent"]
|
||||
delete = options["delete"]
|
||||
mapping_types = options["mapping_types"]
|
||||
criticalmass = options["criticalmass"]
|
||||
seconds_ago = options["seconds_ago"]
|
||||
seconds_ago += options["minutes_ago"] * 60
|
||||
seconds_ago += options["hours_ago"] * 3600
|
||||
if mapping_types:
|
||||
mapping_types = mapping_types.split(",")
|
||||
if not 1 <= percent <= 100:
|
||||
raise CommandError("percent should be between 1 and 100")
|
||||
if percent < 100 and seconds_ago:
|
||||
raise CommandError("you can't specify a time ago and percent")
|
||||
if criticalmass and seconds_ago:
|
||||
raise CommandError("you can't specify a time ago and criticalmass")
|
||||
if percent < 100 and criticalmass:
|
||||
raise CommandError("you can't specify criticalmass and percent")
|
||||
if mapping_types and criticalmass:
|
||||
raise CommandError("you can't specify criticalmass and mapping_types")
|
||||
|
||||
es_reindex_cmd(
|
||||
percent=percent,
|
||||
delete=delete,
|
||||
mapping_types=mapping_types,
|
||||
criticalmass=criticalmass,
|
||||
seconds_ago=seconds_ago,
|
||||
log=FakeLogger(self.stdout),
|
||||
)
|
|
@ -1,25 +0,0 @@
|
|||
from django.core.management.base import BaseCommand
|
||||
|
||||
from kitsune.search.es_utils import es_search_cmd
|
||||
from kitsune.search.utils import FakeLogger
|
||||
|
||||
|
||||
class Command(BaseCommand):
|
||||
help = "Does a front-page search for given query"
|
||||
|
||||
def add_arguments(self, parser):
|
||||
super().add_arguments(parser)
|
||||
parser.add_argument("args", metavar="search_term", nargs="+")
|
||||
parser.add_argument(
|
||||
"--pages",
|
||||
type=int,
|
||||
dest="pages",
|
||||
default=1,
|
||||
help="Number of pages of results you want to see",
|
||||
)
|
||||
|
||||
def handle(self, *args, **options):
|
||||
pages = options["pages"]
|
||||
query = " ".join(args)
|
||||
|
||||
es_search_cmd(query, pages, FakeLogger(self.stdout))
|
|
@ -1,19 +0,0 @@
|
|||
from django.core.management.base import BaseCommand
|
||||
|
||||
from kitsune.search.es_utils import es_status_cmd
|
||||
from kitsune.search.utils import FakeLogger
|
||||
|
||||
|
||||
class Command(BaseCommand):
|
||||
help = "Shows elastic search index status."
|
||||
|
||||
def add_arguments(self, parser):
|
||||
parser.add_argument(
|
||||
"--checkindex",
|
||||
action="store_true",
|
||||
dest="checkindex",
|
||||
help="Checks the index contents",
|
||||
)
|
||||
|
||||
def handle(self, *args, **options):
|
||||
es_status_cmd(options["checkindex"], log=FakeLogger(self.stdout))
|
|
@ -1,11 +0,0 @@
|
|||
from django.core.management.base import BaseCommand
|
||||
|
||||
from kitsune.search.es_utils import es_verify_cmd
|
||||
from kitsune.search.utils import FakeLogger
|
||||
|
||||
|
||||
class Command(BaseCommand):
|
||||
help = "Verifies correctness of all things verifyable."
|
||||
|
||||
def handle(self, *args, **options):
|
||||
es_verify_cmd(FakeLogger(self.stdout))
|
|
@ -1,50 +1,14 @@
|
|||
import datetime
|
||||
import logging
|
||||
from threading import local
|
||||
|
||||
from django.conf import settings
|
||||
from django.core import signals
|
||||
from django.db import models
|
||||
from django.db.models.signals import m2m_changed, post_save, pre_delete
|
||||
from django.dispatch import receiver
|
||||
from elasticsearch.exceptions import NotFoundError
|
||||
from elasticutils.contrib.django import MLT, Indexable, MappingType
|
||||
from elasticutils.contrib.django import Indexable, MappingType
|
||||
|
||||
from kitsune.search import es_utils
|
||||
from kitsune.search.tasks import index_task, unindex_task
|
||||
from kitsune.search.utils import to_class_path
|
||||
from kitsune.sumo.models import ModelBase
|
||||
|
||||
log = logging.getLogger("k.search.es")
|
||||
|
||||
|
||||
# db_table_name -> MappingType class
|
||||
_search_mapping_types = {}
|
||||
|
||||
|
||||
def get_mapping_types(mapping_types=None):
|
||||
"""Returns a list of MappingTypes"""
|
||||
if mapping_types is None:
|
||||
values = list(_search_mapping_types.values())
|
||||
else:
|
||||
values = [_search_mapping_types[name] for name in mapping_types]
|
||||
|
||||
# Sort to stabilize
|
||||
values.sort(key=lambda cls: cls.get_mapping_type_name())
|
||||
return values
|
||||
|
||||
|
||||
# Holds a threadlocal set of indexing tasks to be filed after the request.
|
||||
_local = local()
|
||||
|
||||
|
||||
def _local_tasks():
|
||||
"""(Create and) return the threadlocal set of indexing tasks."""
|
||||
if getattr(_local, "tasks", None) is None:
|
||||
_local.tasks = set()
|
||||
return _local.tasks
|
||||
|
||||
|
||||
class SearchMixin(object):
|
||||
"""A mixin which adds ES indexing support for the model
|
||||
|
||||
|
@ -64,19 +28,15 @@ class SearchMixin(object):
|
|||
@classmethod
|
||||
def get_mapping_type(cls):
|
||||
"""Return the MappingType for this model"""
|
||||
raise NotImplementedError
|
||||
...
|
||||
|
||||
def index_later(self):
|
||||
"""Register myself to be indexed at the end of the request."""
|
||||
_local_tasks().add(
|
||||
(index_task.delay, (to_class_path(self.get_mapping_type()), (self.pk,)))
|
||||
)
|
||||
return
|
||||
|
||||
def unindex_later(self):
|
||||
"""Register myself to be unindexed at the end of the request."""
|
||||
_local_tasks().add(
|
||||
(unindex_task.delay, (to_class_path(self.get_mapping_type()), (self.pk,)))
|
||||
)
|
||||
return
|
||||
|
||||
|
||||
class SearchMappingType(MappingType, Indexable):
|
||||
|
@ -102,204 +62,45 @@ class SearchMappingType(MappingType, Indexable):
|
|||
|
||||
@classmethod
|
||||
def search(cls):
|
||||
return es_utils.Sphilastic(cls)
|
||||
...
|
||||
|
||||
@classmethod
|
||||
def get_index(cls):
|
||||
return es_utils.write_index(cls.get_index_group())
|
||||
...
|
||||
|
||||
@classmethod
|
||||
def get_index_group(cls):
|
||||
return "default"
|
||||
...
|
||||
|
||||
@classmethod
|
||||
def get_query_fields(cls):
|
||||
"""Return the list of fields for query"""
|
||||
raise NotImplementedError
|
||||
...
|
||||
|
||||
@classmethod
|
||||
def get_localized_fields(cls):
|
||||
return []
|
||||
...
|
||||
|
||||
@classmethod
|
||||
def get_indexable(cls, seconds_ago=0):
|
||||
# Some models have a gazillion instances. So we want to go
|
||||
# through them one at a time in a way that doesn't pull all
|
||||
# the data into memory all at once. So we iterate through ids
|
||||
# and pull objects one at a time.
|
||||
qs = cls.get_model().objects.order_by("pk").values_list("pk", flat=True)
|
||||
if seconds_ago:
|
||||
if cls.seconds_ago_filter:
|
||||
dt = datetime.datetime.now() - datetime.timedelta(seconds=seconds_ago)
|
||||
qs = qs.filter(**{cls.seconds_ago_filter: dt})
|
||||
else:
|
||||
# if seconds_ago is specified but seconds_ago_filter is falsy don't index anything
|
||||
return qs.none()
|
||||
|
||||
return qs
|
||||
...
|
||||
|
||||
@classmethod
|
||||
def reshape(cls, results):
|
||||
"""Reshapes the results so lists are lists and everything is not"""
|
||||
# FIXME: This is dumb because we're changing the shape of the
|
||||
# results multiple times in a hokey-pokey kind of way. We
|
||||
# should fix this after SUMO is using Elasticsearch 1.x and it
|
||||
# probably involves an ElasticUtils rewrite or whatever the
|
||||
# next generation is.
|
||||
list_keys = cls.list_keys
|
||||
|
||||
# FIXME: This builds a new dict from the old dict. Might be
|
||||
# cheaper to do it in-place.
|
||||
return [
|
||||
dict((key, (val if key in list_keys else val[0])) for key, val in list(result.items()))
|
||||
for result in results
|
||||
]
|
||||
...
|
||||
|
||||
@classmethod
|
||||
def index(cls, *args, **kwargs):
|
||||
if not settings.ES_LIVE_INDEXING:
|
||||
return
|
||||
|
||||
super(SearchMappingType, cls).index(*args, **kwargs)
|
||||
...
|
||||
|
||||
@classmethod
|
||||
def unindex(cls, *args, **kwargs):
|
||||
if not settings.ES_LIVE_INDEXING:
|
||||
return
|
||||
|
||||
try:
|
||||
super(SearchMappingType, cls).unindex(*args, **kwargs)
|
||||
except NotFoundError:
|
||||
# Ignore the case where we try to delete something that's
|
||||
# not there.
|
||||
pass
|
||||
...
|
||||
|
||||
@classmethod
|
||||
def morelikethis(cls, id_, s, fields):
|
||||
"""MoreLikeThis API"""
|
||||
return list(MLT(id_, s, fields, min_term_freq=1, min_doc_freq=1))
|
||||
|
||||
|
||||
def _identity(s):
|
||||
return s
|
||||
|
||||
|
||||
def register_for_indexing(app, sender_class, instance_to_indexee=_identity, m2m=False):
|
||||
"""Registers a model for signal-based live-indexing.
|
||||
|
||||
As data changes in the database, we need to update the relevant
|
||||
documents in the index. This function registers Django model
|
||||
classes with the appropriate signals and update/delete routines
|
||||
such that our index stays up-to-date.
|
||||
|
||||
:arg app: A bit of UID we use to build the signal handlers'
|
||||
dispatch_uids. This is prepended to the ``sender_class``
|
||||
model name, "elastic", and the signal name, so it should
|
||||
combine with those to make something unique. For this reason,
|
||||
the app name is usually a good choice, yielding something like
|
||||
"wiki.TaggedItem.elastic.post_save".
|
||||
:arg sender_class: The class to listen for saves and deletes on.
|
||||
:arg instance_to_indexee: A callable which takes the signalling
|
||||
instance and returns the model instance to be indexed. The
|
||||
returned instance should be a subclass of SearchMixin. If the
|
||||
callable returns None, no indexing is performed.
|
||||
|
||||
Default: a callable which returns the sender itself.
|
||||
:arg m2m: True if this is a m2m model and False otherwise.
|
||||
|
||||
Examples::
|
||||
|
||||
# Registers MyModel for indexing. post_save creates new
|
||||
# documents in the index. pre_delete removes documents
|
||||
# from the index.
|
||||
register_for_indexing(MyModel, 'some_app')
|
||||
|
||||
# Registers RelatedModel for indexing. RelatedModel is related
|
||||
# to some model in the sense that the document in the index is
|
||||
# composed of data from some model and it's related
|
||||
# RelatedModel instance. Because of that when we update
|
||||
# RelatedModel instances, we need to update the associated
|
||||
# document in the index for the related model.
|
||||
#
|
||||
# This registers the RelatedModel for indexing. post_save and
|
||||
# pre_delete update the associated document in the index for
|
||||
# the related model. The related model instance is determined
|
||||
# by the instance_to_indexee function.
|
||||
register_for_indexing(RelatedModel, 'some_app',
|
||||
instance_to_indexee=lambda r: r.my_model)
|
||||
|
||||
|
||||
"""
|
||||
|
||||
def maybe_call_method(instance, is_raw, method_name):
|
||||
"""Call an (un-)indexing method on instance if appropriate."""
|
||||
obj = instance_to_indexee(instance)
|
||||
if obj is not None and not is_raw:
|
||||
getattr(obj, method_name)()
|
||||
|
||||
def update(sender, instance, **kw):
|
||||
"""File an add-to-index task for the indicated object."""
|
||||
maybe_call_method(instance, kw.get("raw"), "index_later")
|
||||
|
||||
def delete(sender, instance, **kw):
|
||||
"""File a remove-from-index task for the indicated object."""
|
||||
maybe_call_method(instance, kw.get("raw"), "unindex_later")
|
||||
|
||||
def indexing_receiver(signal, signal_name):
|
||||
"""Return a routine that registers signal handlers for indexers.
|
||||
|
||||
The returned registration routine uses strong refs, makes up a
|
||||
dispatch_uid, and uses ``sender_class`` as the sender.
|
||||
|
||||
"""
|
||||
return receiver(
|
||||
signal,
|
||||
sender=sender_class,
|
||||
dispatch_uid="%s.%s.elastic.%s" % (app, sender_class.__name__, signal_name),
|
||||
weak=False,
|
||||
)
|
||||
|
||||
if m2m:
|
||||
# This is an m2m model, so we regstier m2m_chaned and it
|
||||
# updates the existing document in the index.
|
||||
indexing_receiver(m2m_changed, "m2m_changed")(update)
|
||||
|
||||
else:
|
||||
indexing_receiver(post_save, "post_save")(update)
|
||||
|
||||
indexing_receiver(pre_delete, "pre_delete")(
|
||||
# If it's the indexed instance that's been deleted, go ahead
|
||||
# and delete it from the index. Otherwise, we just want to
|
||||
# update whatever model it's related to.
|
||||
delete
|
||||
if instance_to_indexee is _identity
|
||||
else update
|
||||
)
|
||||
|
||||
|
||||
def register_mapping_type(cls):
|
||||
"""Class decorator for registering MappingTypes for search"""
|
||||
_search_mapping_types[cls.get_mapping_type_name()] = cls
|
||||
return cls
|
||||
|
||||
|
||||
def generate_tasks(**kwargs):
|
||||
"""Goes through thread local index update tasks set and generates
|
||||
celery tasks for all tasks in the set.
|
||||
|
||||
Because this works off of a set, it naturally de-dupes the tasks,
|
||||
so if four tasks get tossed into the set that are identical, we
|
||||
execute it only once.
|
||||
|
||||
"""
|
||||
tasks = _local_tasks()
|
||||
for fun, args in tasks:
|
||||
fun(*args)
|
||||
|
||||
tasks.clear()
|
||||
|
||||
|
||||
signals.request_finished.connect(generate_tasks)
|
||||
...
|
||||
|
||||
|
||||
class RecordManager(models.Manager):
|
||||
|
|
|
@ -1,124 +0,0 @@
|
|||
from itertools import chain
|
||||
|
||||
from django.conf import settings
|
||||
|
||||
from elasticsearch import RequestsHttpConnection
|
||||
|
||||
from kitsune import search as constants
|
||||
from kitsune.questions.models import QuestionMappingType
|
||||
from kitsune.search import es_utils
|
||||
from kitsune.wiki.models import DocumentMappingType
|
||||
|
||||
|
||||
def apply_boosts(searcher):
|
||||
"""Returns searcher with boosts applied"""
|
||||
return searcher.boost(
|
||||
question_title=4.0,
|
||||
question_content=3.0,
|
||||
question_answer_content=3.0,
|
||||
post_title=2.0,
|
||||
post_content=1.0,
|
||||
document_title=6.0,
|
||||
document_content=1.0,
|
||||
document_keywords=8.0,
|
||||
document_summary=2.0,
|
||||
# Text phrases in document titles and content get an extra boost.
|
||||
document_title__match_phrase=10.0,
|
||||
document_content__match_phrase=8.0,
|
||||
)
|
||||
|
||||
|
||||
def generate_simple_search(search_form, language, with_highlights=False):
|
||||
"""Generates an S given a form
|
||||
|
||||
:arg search_form: a validated SimpleSearch form
|
||||
:arg language: the language code
|
||||
:arg with_highlights: whether or not to ask for highlights
|
||||
|
||||
:returns: a fully formed S
|
||||
|
||||
"""
|
||||
# We use a regular S here because we want to search across
|
||||
# multiple doctypes.
|
||||
searcher = (
|
||||
es_utils.AnalyzerS()
|
||||
.es(
|
||||
urls=settings.ES_URLS,
|
||||
timeout=settings.ES_TIMEOUT,
|
||||
use_ssl=settings.ES_USE_SSL,
|
||||
http_auth=settings.ES_HTTP_AUTH,
|
||||
connection_class=RequestsHttpConnection,
|
||||
)
|
||||
.indexes(es_utils.read_index("default"))
|
||||
)
|
||||
|
||||
cleaned = search_form.cleaned_data
|
||||
|
||||
doctypes = []
|
||||
final_filter = es_utils.F()
|
||||
cleaned_q = cleaned["q"]
|
||||
products = cleaned["product"]
|
||||
|
||||
# Handle wiki filters
|
||||
if cleaned["w"] & constants.WHERE_WIKI:
|
||||
wiki_f = es_utils.F(
|
||||
model="wiki_document",
|
||||
document_category__in=settings.SEARCH_DEFAULT_CATEGORIES,
|
||||
document_locale=language,
|
||||
document_is_archived=False,
|
||||
)
|
||||
|
||||
for p in products:
|
||||
wiki_f &= es_utils.F(product=p)
|
||||
|
||||
doctypes.append(DocumentMappingType.get_mapping_type_name())
|
||||
final_filter |= wiki_f
|
||||
|
||||
# Handle question filters
|
||||
if cleaned["w"] & constants.WHERE_SUPPORT:
|
||||
question_f = es_utils.F(
|
||||
model="questions_question", question_is_archived=False, question_has_helpful=True
|
||||
)
|
||||
|
||||
for p in products:
|
||||
question_f &= es_utils.F(product=p)
|
||||
|
||||
doctypes.append(QuestionMappingType.get_mapping_type_name())
|
||||
final_filter |= question_f
|
||||
|
||||
# Build a filter for those filters and add the other bits to
|
||||
# finish the search
|
||||
searcher = searcher.doctypes(*doctypes)
|
||||
searcher = searcher.filter(final_filter)
|
||||
|
||||
if cleaned["explain"]:
|
||||
searcher = searcher.explain()
|
||||
|
||||
if with_highlights:
|
||||
# Set up the highlights. Show the entire field highlighted.
|
||||
searcher = searcher.highlight(
|
||||
"question_content", # support forum
|
||||
"document_summary", # kb
|
||||
pre_tags=["<b>"],
|
||||
post_tags=["</b>"],
|
||||
number_of_fragments=0,
|
||||
)
|
||||
|
||||
searcher = apply_boosts(searcher)
|
||||
|
||||
# Build the query
|
||||
query_fields = chain(
|
||||
*[cls.get_query_fields() for cls in [DocumentMappingType, QuestionMappingType]]
|
||||
)
|
||||
query = {}
|
||||
# Create match and match_phrase queries for every field
|
||||
# we want to search.
|
||||
for field in query_fields:
|
||||
for query_type in ["match", "match_phrase"]:
|
||||
query["%s__%s" % (field, query_type)] = cleaned_q
|
||||
|
||||
# Transform the query to use locale aware analyzers.
|
||||
query = es_utils.es_query_with_analyzer(query, language)
|
||||
|
||||
searcher = searcher.query(should=True, **query)
|
||||
return searcher
|
|
@ -1,83 +0,0 @@
|
|||
"""
|
||||
Utitilities for working with synonyms, both in the database and in ES.
|
||||
"""
|
||||
|
||||
import re
|
||||
|
||||
from kitsune.search import es_utils
|
||||
from kitsune.search.models import Synonym
|
||||
|
||||
|
||||
class SynonymParseError(Exception):
|
||||
"""One or more parser errors were found. Has a list of errors found."""
|
||||
|
||||
def __init__(self, errors, *args, **kwargs):
|
||||
super(SynonymParseError, self).__init__(*args, **kwargs)
|
||||
self.errors = errors
|
||||
|
||||
|
||||
def parse_synonyms(text):
|
||||
"""
|
||||
Parse synonyms from user entered text.
|
||||
|
||||
The input should look something like
|
||||
|
||||
foo => bar
|
||||
baz, qux => flob, glork
|
||||
|
||||
:returns: A set of 2-tuples, ``(from_words, to_words)``. ``from_words``
|
||||
and ``to_words`` will be strings.
|
||||
:throws: A SynonymParseError, if any errors are found.
|
||||
"""
|
||||
|
||||
errors = []
|
||||
synonyms = set()
|
||||
|
||||
for i, line in enumerate(text.split("\n"), 1):
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
count = line.count("=>")
|
||||
if count < 1:
|
||||
errors.append("Syntax error on line %d: No => found." % i)
|
||||
elif count > 1:
|
||||
errors.append("Syntax error on line %d: Too many => found." % i)
|
||||
else:
|
||||
from_words, to_words = [s.strip() for s in line.split("=>")]
|
||||
synonyms.add((from_words, to_words))
|
||||
|
||||
if errors:
|
||||
raise SynonymParseError(errors)
|
||||
else:
|
||||
return synonyms
|
||||
|
||||
|
||||
def count_out_of_date():
|
||||
"""
|
||||
Count number of synonyms that differ between the database and ES.
|
||||
|
||||
:returns: A 2-tuple where the first element is the number of synonyms
|
||||
that are in the DB but not in ES, and the second element is the
|
||||
number of synonyms in ES that are not in the DB.
|
||||
"""
|
||||
es = es_utils.get_es()
|
||||
|
||||
index_name = es_utils.write_index("default")
|
||||
settings = es.indices.get_settings(index_name).get(index_name, {}).get("settings", {})
|
||||
|
||||
synonym_key_re = re.compile(r"index\.analysis\.filter\.synonyms-.*\.synonyms\.\d+")
|
||||
|
||||
synonyms_in_es = set()
|
||||
for key, val in list(settings.items()):
|
||||
if synonym_key_re.match(key):
|
||||
synonyms_in_es.add(val)
|
||||
|
||||
synonyms_in_db = set(str(s) for s in Synonym.objects.all())
|
||||
|
||||
synonyms_to_add = synonyms_in_db - synonyms_in_es
|
||||
synonyms_to_remove = synonyms_in_es - synonyms_in_db
|
||||
|
||||
if synonyms_to_remove == {"firefox => firefox"}:
|
||||
synonyms_to_remove = set()
|
||||
|
||||
return (len(synonyms_to_add), len(synonyms_to_remove))
|
|
@ -1,177 +0,0 @@
|
|||
import datetime
|
||||
import logging
|
||||
import sys
|
||||
import traceback
|
||||
|
||||
from celery import task
|
||||
from elasticutils.contrib.django import get_es
|
||||
from multidb.pinning import pin_this_thread, unpin_this_thread
|
||||
|
||||
from kitsune.search.es_utils import UnindexMeBro, get_analysis, index_chunk, write_index
|
||||
from kitsune.search.utils import from_class_path
|
||||
|
||||
# This is present in memcached when reindexing is in progress and
|
||||
# holds the number of outstanding index chunks. Once it hits 0,
|
||||
# indexing is done.
|
||||
OUTSTANDING_INDEX_CHUNKS = "search:outstanding_index_chunks"
|
||||
|
||||
CHUNK_SIZE = 50000
|
||||
|
||||
log = logging.getLogger("k.task")
|
||||
|
||||
|
||||
class IndexingTaskError(Exception):
|
||||
"""Exception that captures current exception information
|
||||
|
||||
Some exceptions aren't pickleable. This uses traceback module to
|
||||
format the exception that's currently being thrown and tosses it
|
||||
in the message of IndexingTaskError at the time the
|
||||
IndexingTaskError is created.
|
||||
|
||||
So you can do this::
|
||||
|
||||
try:
|
||||
# some code that throws an error
|
||||
except Exception as exc:
|
||||
raise IndexingTaskError()
|
||||
|
||||
The message will have the message and traceback from the original
|
||||
exception thrown.
|
||||
|
||||
Yes, this is goofy.
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
super(IndexingTaskError, self).__init__(traceback.format_exc())
|
||||
|
||||
|
||||
@task()
|
||||
def index_chunk_task(write_index, batch_id, rec_id, chunk):
|
||||
"""Index a chunk of things.
|
||||
|
||||
:arg write_index: the name of the index to index to
|
||||
:arg batch_id: the name for the batch this chunk belongs to
|
||||
:arg rec_id: the id for the record for this task
|
||||
:arg chunk: a (class, id_list) of things to index
|
||||
"""
|
||||
cls_path, id_list = chunk
|
||||
cls = from_class_path(cls_path)
|
||||
rec = None
|
||||
|
||||
# Need to import Record here to prevent circular import
|
||||
from kitsune.search.models import Record
|
||||
|
||||
try:
|
||||
# Pin to master db to avoid replication lag issues and stale data.
|
||||
pin_this_thread()
|
||||
|
||||
# Update record data.
|
||||
rec = Record.objects.get(pk=rec_id)
|
||||
rec.start_time = datetime.datetime.now()
|
||||
rec.message = "Reindexing into %s" % write_index
|
||||
rec.status = Record.STATUS_IN_PROGRESS
|
||||
rec.save()
|
||||
|
||||
index_chunk(cls, id_list, reraise=True)
|
||||
rec.mark_success()
|
||||
|
||||
except Exception:
|
||||
if rec is not None:
|
||||
rec.mark_fail("Errored out %s %s" % (sys.exc_info()[0], sys.exc_info()[1]))
|
||||
|
||||
log.exception("Error while indexing a chunk")
|
||||
# Some exceptions aren't pickleable and we need this to throw
|
||||
# things that are pickleable.
|
||||
raise IndexingTaskError()
|
||||
|
||||
finally:
|
||||
unpin_this_thread()
|
||||
|
||||
|
||||
# Note: If you reduce the length of RETRY_TIMES, it affects all tasks
|
||||
# currently in the celery queue---they'll throw an IndexError.
|
||||
RETRY_TIMES = (
|
||||
60, # 1 minute
|
||||
5 * 60, # 5 minutes
|
||||
10 * 60, # 10 minutes
|
||||
30 * 60, # 30 minutes
|
||||
60 * 60, # 60 minutes
|
||||
)
|
||||
MAX_RETRIES = len(RETRY_TIMES)
|
||||
|
||||
|
||||
@task()
|
||||
def index_task(cls_path, id_list, **kw):
|
||||
"""Index documents specified by cls and ids"""
|
||||
cls = from_class_path(cls_path)
|
||||
try:
|
||||
# Pin to master db to avoid replication lag issues and stale
|
||||
# data.
|
||||
pin_this_thread()
|
||||
|
||||
qs = cls.get_model().objects.filter(pk__in=id_list).values_list("pk", flat=True)
|
||||
for id_ in qs:
|
||||
try:
|
||||
cls.index(cls.extract_document(id_), id_=id_)
|
||||
except UnindexMeBro:
|
||||
# If extract_document throws this, then we need to
|
||||
# remove this item from the index.
|
||||
cls.unindex(id_)
|
||||
|
||||
except Exception as exc:
|
||||
retries = index_task.request.retries
|
||||
if retries >= MAX_RETRIES:
|
||||
# Some exceptions aren't pickleable and we need this to
|
||||
# throw things that are pickleable.
|
||||
raise IndexingTaskError()
|
||||
|
||||
index_task.retry(exc=exc, max_retries=MAX_RETRIES, countdown=RETRY_TIMES[retries])
|
||||
finally:
|
||||
unpin_this_thread()
|
||||
|
||||
|
||||
@task()
|
||||
def unindex_task(cls_path, id_list, **kw):
|
||||
"""Unindex documents specified by cls and ids"""
|
||||
cls = from_class_path(cls_path)
|
||||
try:
|
||||
# Pin to master db to avoid replication lag issues and stale
|
||||
# data.
|
||||
pin_this_thread()
|
||||
for id_ in id_list:
|
||||
cls.unindex(id_)
|
||||
except Exception as exc:
|
||||
retries = unindex_task.request.retries
|
||||
if retries >= MAX_RETRIES:
|
||||
# Some exceptions aren't pickleable and we need this to
|
||||
# throw things that are pickleable.
|
||||
raise IndexingTaskError()
|
||||
|
||||
unindex_task.retry(exc=exc, max_retries=MAX_RETRIES, countdown=RETRY_TIMES[retries])
|
||||
finally:
|
||||
unpin_this_thread()
|
||||
|
||||
|
||||
@task()
|
||||
def update_synonyms_task():
|
||||
es = get_es()
|
||||
|
||||
# Close the index, update the settings, then re-open it.
|
||||
# This will cause search to be unavailable for a few seconds.
|
||||
# This updates all of the analyzer settings, which is kind of overkill,
|
||||
# but will make sure everything stays consistent.
|
||||
index = write_index("default")
|
||||
analysis = get_analysis()
|
||||
|
||||
# if anything goes wrong, it is very important to re-open the index.
|
||||
try:
|
||||
es.indices.close(index)
|
||||
es.indices.put_settings(
|
||||
index=index,
|
||||
body={
|
||||
"analysis": analysis,
|
||||
},
|
||||
)
|
||||
finally:
|
||||
es.indices.open(index)
|
|
@ -1,72 +0,0 @@
|
|||
{% extends "kadmin/base.html" %}
|
||||
|
||||
{% block content_title %}
|
||||
<h1>Elastic Search - Index Browser of Doom</h1>
|
||||
{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<section>
|
||||
<h1>Find a specific item</h1>
|
||||
<form method="GET">
|
||||
<label for="bucket-field">Model:</label>
|
||||
<select name="bucket" id="bucket-field">
|
||||
{% for bucket in buckets %}
|
||||
<option value="{{ bucket }}"{% if requested_bucket == bucket %} selected{% endif %}>{{ bucket }}</option>
|
||||
{% endfor %}
|
||||
</select>
|
||||
<label for="id-field">ID:</label>
|
||||
<input type="text" name="id" id="id-field" value="{{ requested_id }}">
|
||||
<input type="Submit">
|
||||
</form>
|
||||
</section>
|
||||
|
||||
{% if requested_data %}
|
||||
<section>
|
||||
<h1>Item {{ requested_data.id }} from {{ requested_bucket }}</h1>
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>key</th>
|
||||
<th>value</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for key, val in requested_data.items %}
|
||||
<tr>
|
||||
<th>{{ key }}</th>
|
||||
<td>{{ val }}</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
{% else %}
|
||||
<section>
|
||||
<h1>Most recently indexed items per bucket</h1>
|
||||
{% for cls_name, items in last_20_by_bucket %}
|
||||
<h2>{{ cls_name }}</h2>
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>id</th>
|
||||
<th>title</th>
|
||||
<th>indexed on ({{ settings.TIME_ZONE }})</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for item in items %}
|
||||
<tr>
|
||||
<td><a href="?bucket={{ cls_name }}&id={{ item.id }}">{{ item.id }}</a></td>
|
||||
{# cheating here because only one of these is filled, but #}
|
||||
{# in django templates, you get an empty string if it's #}
|
||||
{# not there. #}
|
||||
<td>{{ item.question_title }}{{ item.post_title }}{{ item.document_title }}</td>
|
||||
<td>{{ item.indexed_on }}</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
{% endfor %}
|
||||
</section>
|
||||
{% endif %}
|
||||
{% endblock %}
|
|
@ -1,382 +0,0 @@
|
|||
{% extends "kadmin/base.html" %}
|
||||
{% block content_title %}
|
||||
<h1>Elastic Search</h1>
|
||||
{% endblock %}
|
||||
|
||||
{% block extrastyle %}
|
||||
{{ block.super }}
|
||||
<style type="text/css">
|
||||
div#content div {
|
||||
margin-bottom: .5em;
|
||||
}
|
||||
.disabled {
|
||||
color: #ccc;
|
||||
}
|
||||
progress {
|
||||
width: 400px;
|
||||
}
|
||||
dd {
|
||||
margin-left: 1em;
|
||||
}
|
||||
input[type="submit"].DANGER {
|
||||
border: 3px red solid;
|
||||
font: bold 12px/14px serif;
|
||||
}
|
||||
.errorspan {
|
||||
background: #ffc;
|
||||
border: 1px solid red;
|
||||
padding: 1.5px;
|
||||
}
|
||||
.errorspan img {
|
||||
transform: translate(0,-1px);
|
||||
}
|
||||
table.reindextable td.explanation {
|
||||
width: 40%;
|
||||
}
|
||||
</style>
|
||||
{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<section>
|
||||
<p>
|
||||
Page last rendered: {{ now }} {{ settings.TIME_ZONE }}
|
||||
</p>
|
||||
</section>
|
||||
|
||||
{% if error_messages %}
|
||||
<section>
|
||||
<h1>Errors</h1>
|
||||
{% for msg in error_messages %}
|
||||
<p>{{ msg }}</p>
|
||||
{% endfor %}
|
||||
</section>
|
||||
{% endif %}
|
||||
|
||||
{% if outstanding_records.count > 0 %}
|
||||
<section>
|
||||
<p>
|
||||
Auto-refreshing every 30 seconds :: <a href="{{ request.path }}">Refresh page</a>
|
||||
</p>
|
||||
<script>setTimeout("window.location.reload(true);", 30000);</script>
|
||||
|
||||
<h2>{{ outstanding_records.count }} outstanding records</h2>
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>batch</th>
|
||||
<th>name</th>
|
||||
<th>created</th>
|
||||
<th>start</th>
|
||||
<th>end</th>
|
||||
<th>message</th>
|
||||
<th>delta</th>
|
||||
</tr>
|
||||
</thead>
|
||||
{% for record in outstanding_records %}
|
||||
<tr>
|
||||
<td>{{ record.batch_id }}</td>
|
||||
<td>{{ record.name }}</td>
|
||||
<td>{{ record.creation_time }}</td>
|
||||
<td>{{ record.start_time }}</td>
|
||||
<td>{{ record.end_time }}</td>
|
||||
<td>{{ record.message }}</td>
|
||||
<td>{{ record.delta }}</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</table>
|
||||
</section>
|
||||
{% endif %}
|
||||
|
||||
{% if outstanding_chunks %}
|
||||
<section>
|
||||
<h1>Indexing in progress! Outstanding tasks: {{ outstanding_chunks }}</h1>
|
||||
<p>
|
||||
</p>
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>message</th>
|
||||
<th>start time</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for record in outstanding_records %}
|
||||
<tr>
|
||||
<td>{{ record.text }}</td>
|
||||
<td>{{ record.starttime }}</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
<p>
|
||||
Note: The number of records may not line up with the number of
|
||||
outstanding indexing tasks because records are created when
|
||||
the task starts.
|
||||
</p>
|
||||
</section>
|
||||
{% endif %}
|
||||
|
||||
<section>
|
||||
<h1>Settings and Elasticsearch details</h1>
|
||||
<p>
|
||||
Settings at the time this page was loaded:
|
||||
</p>
|
||||
<table>
|
||||
<tr><th>ES_LIVE_INDEXING</th><td>{{ settings.ES_LIVE_INDEXING }}</td></tr>
|
||||
<tr><th>ES_INDEX_PREFIX</th><td>{{ settings.ES_INDEX_PREFIX }}</td></tr>
|
||||
<tr><th>ES_INDEXES</th><td>{{ settings.ES_INDEXES }}</td></tr>
|
||||
<tr><th>ES_WRITE_INDEXES</th><td>{{ settings.ES_WRITE_INDEXES }}</td></tr>
|
||||
<tr><th>Elasticsearch version</th><td>{{ es_deets.version.number }}</td></tr>
|
||||
</table>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h1>Index Status</h1>
|
||||
<p>
|
||||
All available indexes:
|
||||
</p>
|
||||
<table>
|
||||
<thead>
|
||||
<th>Index Name</th>
|
||||
<th>Documents</th>
|
||||
<th>Type</th>
|
||||
<th>Delete</th>
|
||||
</thead>
|
||||
|
||||
<tbody>
|
||||
{% for index_name, index_count in indexes %}
|
||||
<tr>
|
||||
<td>{{ index_name }}</td>
|
||||
<td>{{ index_count }}</td>
|
||||
<td>
|
||||
{% if index_name in read_indexes and index_name in write_indexes %}
|
||||
READ/WRITE
|
||||
{% else %}
|
||||
{% if index_name in read_indexes %}
|
||||
READ
|
||||
{% else %}
|
||||
{% if index_name in write_indexes %}
|
||||
WRITE
|
||||
{% endif %}
|
||||
{% endif %}
|
||||
{% endif %}
|
||||
</td>
|
||||
{% if index_name not in read_indexes %}
|
||||
<td>
|
||||
<form method="POST">
|
||||
{% csrf_token %}
|
||||
<input type="hidden" name="delete_index" value="{{ index_name }}">
|
||||
<input type="submit" value="Delete">
|
||||
</form>
|
||||
</td>
|
||||
{% else %}
|
||||
<td>Disabled</td>
|
||||
{% endif %}
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
<h2>Read indexes</h2>
|
||||
{% for index, stats in doctype_stats.items %}
|
||||
<h3>{{ index }}</h3>
|
||||
{% if stats == None %}
|
||||
<p>Index does not exist.</p>
|
||||
{% else %}
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>doctype</th><th>count</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for doctype, count in stats.items %}
|
||||
<tr><td>{{ doctype }}</td><td>{{ count }}</td></tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
{% endif %}
|
||||
{% endfor %}
|
||||
|
||||
<h2>Write indexes</h2>
|
||||
{% if read_indexes == write_indexes %}
|
||||
<p>
|
||||
Write indexes are the same as the read indexes.
|
||||
</p>
|
||||
{% else %}
|
||||
{% for index, stats in doctype_stats.items %}
|
||||
<h3>{{ index }}</h3>
|
||||
{% if stats == None %}
|
||||
<p>Index does not exist.</p>
|
||||
{% else %}
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>doctype</th><th>count</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for doctype, count in stats.items %}
|
||||
<tr><td>{{ doctype }}</td><td>{{ count }}</td></tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
{% endif %}
|
||||
{% endfor %}
|
||||
{% endif %}
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h1>Actions</h1>
|
||||
<table class="reindextable">
|
||||
<tr>
|
||||
<td colspan="2">
|
||||
<h2>REINDEX into existing index</h2>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class="explanation">
|
||||
<p>
|
||||
Reindex into the existing WRITE index. Don't do this if you've
|
||||
made mapping changes since this does not recreate the index with
|
||||
the new mappings.
|
||||
</p>
|
||||
{% if outstanding_chunks %}
|
||||
<p class="errornote">
|
||||
WARNING! There are outstanding index tasks! Don't launch another
|
||||
indexing pass unless you really know you want to.
|
||||
</p>
|
||||
{% endif %}
|
||||
{% if not settings.ES_LIVE_INDEXING %}
|
||||
<p class="errornote">
|
||||
WARNING! <tt>ES_LIVE_INDEXING</tt> is False so you can't
|
||||
reindex via the admin. Either enable <tt>ES_LIVE_INDEXING</tt>
|
||||
or use the command line <tt>./manage.py esreindex</tt>.
|
||||
</p>
|
||||
{% endif %}
|
||||
</td>
|
||||
<td>
|
||||
{% if doctype_write_stats != None %}
|
||||
<form method="POST">
|
||||
{% csrf_token %}
|
||||
{% for index, stats in doctype_write_stats.items %}
|
||||
<h3>{{ index }}</h3>
|
||||
{% for doctype, count in stats.items %}
|
||||
<input id="check_{{ doctype }}" type="checkbox" name="check_{{ doctype }}" value="yes" checked>
|
||||
<label for="check_{{ doctype }}">{{ doctype }}</label><br>
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
<input type="submit" name="reindex" value="Reindex into write indexes"
|
||||
{% if not settings.ES_LIVE_INDEXING or outstanding_chunks %}disabled{% endif %}>
|
||||
</form>
|
||||
{% endif %}
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td colspan="2">
|
||||
<h2>DELETE existing index group's write index, recreate it and reindex</h2>
|
||||
</td>
|
||||
<tr>
|
||||
<td class="explanation">
|
||||
<p>
|
||||
This <strong>DELETES</strong> the existing WRITE index for a
|
||||
group, recreates it with the mappings, and indexes into the new
|
||||
index. You should have to do this only when the search mapping
|
||||
changes or when setting up the site for the first time.
|
||||
</p>
|
||||
{% if read_indexes == write_indexes %}
|
||||
<p class="errornote">
|
||||
WARNING! All read and write indexes are the same! Deleting and
|
||||
rebuilding the index would be really bad!
|
||||
</p>
|
||||
{% endif %}
|
||||
</td>
|
||||
<td>
|
||||
<form method="POST">
|
||||
<table>
|
||||
<tr>
|
||||
<th></th>
|
||||
<th>Group</th>
|
||||
<th>Read Index</th>
|
||||
<th>Write Index</th>
|
||||
</tr>
|
||||
{% for group, group_read_index, group_write_index in index_group_data %}
|
||||
<tr>
|
||||
<td>
|
||||
<input id="check_{{ group }}" type="checkbox" name="check_{{ group }}" value="yes"
|
||||
{% if group_read_index != group_write_index %}checked{% endif %}>
|
||||
</td>
|
||||
<td><label for="check_{{ group }}">{{ group }}</label></td>
|
||||
<td>{{ group_read_index }}</td>
|
||||
<td>{{ group_write_index }}</td>
|
||||
<td>
|
||||
{% if group_read_index == group_write_index %}
|
||||
<span class="errorspan">
|
||||
<img src="{{ STATIC_URL }}admin/img/icon_error.gif" />
|
||||
This group's write index is a read index!
|
||||
</span>
|
||||
{% endif %}
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</table>
|
||||
|
||||
{% csrf_token %}
|
||||
<input class="DANGER" type="submit" name="recreate_index" value="DELETE selected indexes and reindex"
|
||||
{% if not settings.ES_LIVE_INDEXING or outstanding_chunks %}disabled{% endif %}>
|
||||
</form>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td colspan="2">
|
||||
<h2>RESET records and mark as failed</h2>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td class="explanation">
|
||||
<p>
|
||||
This marks outstanding records as fail. This allows you to run a
|
||||
new reindexing pass.
|
||||
</p>
|
||||
</td>
|
||||
<td>
|
||||
<form method="POST">
|
||||
{% csrf_token %}
|
||||
<input type="hidden" name="reset" value="1">
|
||||
<input type="submit" name="reset" value="Mark records as failed">
|
||||
</form>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h1>Reindexing history</h1>
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>batch</th>
|
||||
<th>name</th>
|
||||
<th>created</th>
|
||||
<th>start</th>
|
||||
<th>end</th>
|
||||
<th>status</th>
|
||||
<th>message</th>
|
||||
<th>delta</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for record in recent_records %}
|
||||
<tr>
|
||||
<td>{{ record.batch_id }}</td>
|
||||
<td>{{ record.name }}</td>
|
||||
<td>{{ record.creation_time }}</td>
|
||||
<td>{{ record.start_time }}</td>
|
||||
<td>{{ record.end_time }}</td>
|
||||
<td>{{ record.status }}</td>
|
||||
<td>{{ record.message }}</td>
|
||||
<td>{{ record.delta }}</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
{% endblock %}
|
|
@ -1,14 +0,0 @@
|
|||
{% extends "kadmin/base.html" %}
|
||||
|
||||
{% block content_title %}
|
||||
<h1>Elastic Search - Mapping Browser</h1>
|
||||
{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<section>
|
||||
<h1>Merged Mapping</h1>
|
||||
<pre>
|
||||
{{ mapping }}
|
||||
</pre>
|
||||
</section>
|
||||
{% endblock %}
|
|
@ -1,150 +0,0 @@
|
|||
{% extends "kadmin/base.html" %}
|
||||
|
||||
{% block content_title %}
|
||||
<h1>Elastic Search - Synonym Editor</h1>
|
||||
{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<style>
|
||||
.errornote,
|
||||
.notice,
|
||||
p,
|
||||
textarea {
|
||||
box-sizing: border-box;
|
||||
max-width: 600px;
|
||||
}
|
||||
|
||||
.notice {
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
textarea {
|
||||
height: 400px;
|
||||
width: 600px;
|
||||
border-left: 0;
|
||||
margin-left: 0;
|
||||
padding-left: 7px;
|
||||
line-height: 14px;
|
||||
}
|
||||
|
||||
.line-numbers {
|
||||
box-sizing: border-box;
|
||||
display: inline-block;
|
||||
font-size: 9px;
|
||||
line-height: 14px;
|
||||
margin: 2px 0;
|
||||
height: 400px;
|
||||
overflow: hidden;
|
||||
padding: 4px 7px 2px 5px;
|
||||
border: 1px solid #ccc;
|
||||
border-right: 1px dotted #ddd;
|
||||
text-align: right;
|
||||
opacity: 0.8;
|
||||
}
|
||||
</style>
|
||||
|
||||
<section>
|
||||
<p>
|
||||
There are currently {{ synonym_add_count }}
|
||||
synonym{{ synonym_add_count|pluralize }} that have not been synced to ES,
|
||||
and {{ synonym_remove_count }} synonym{{ synonym_add_count|pluralize }}
|
||||
that need to be removed from ES.
|
||||
</p>
|
||||
|
||||
<form method="POST">
|
||||
{% csrf_token %}
|
||||
<input type="hidden" name="sync_synonyms" value="1">
|
||||
<input type="submit" value="Sync synonyms to ES">
|
||||
</form>
|
||||
|
||||
<p>
|
||||
Press this button to update the synonym list in Elasticsearch to match
|
||||
what is in the database.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Keep in mind that changing synonyms will cause a small down time
|
||||
to the search system, during which time users will receive a friendly
|
||||
error message, and some parts of the site will be slower. This downtime
|
||||
should only last a few seconds. Consider doing this during off-peak
|
||||
hours, like after 00:00 UTC.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<p class="notice">
|
||||
This is an advanced way to edit synonyms with no training wheels.
|
||||
It is intended for bulk insertion and mass editing by expert users.
|
||||
If that doesn't sound like you, you can use the
|
||||
<a href="{% url 'admin:search_synonym_changelist' %}">simpler interface</a>
|
||||
instead. Remember to come back here and click that sync button above
|
||||
after editing the synonyms.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
This is <strong>all</strong> the synonyms for the site. If you add a line,
|
||||
it will create a new synonym set. If you delete a line, that synonym will
|
||||
be deleted. Be careful!
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The format here is one synonym set per line. A synonym set is a set of
|
||||
words on the left that will be transformed into the set of words on the right, with a
|
||||
"fat arrow" (<code>=></code>) in between. Each of the words on the left
|
||||
set will be converted to all of the words on the right. For example a line
|
||||
like <code>social => facebook, twitter</code> would make a search for
|
||||
"social integration" match all of the documents "facebook integration"
|
||||
and "twitter integration". It's fine to have multi-word phrases like
|
||||
<code>address bar, location bar, awesome bar => address bar, location bar, awesome bar</code>,
|
||||
which would make those three phrases completely interchangeable.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Note that the original word is lost during the conversion. If you want to
|
||||
keep the original words(s) in the search, include those words on the right,
|
||||
For example <code>social => facebook, twitter, social</code>. Also,
|
||||
synonyms are one way only. If you want two way synonyms, you need two
|
||||
lines, or to put all words on both sides.
|
||||
</p>
|
||||
|
||||
{% for error in errors %}
|
||||
<span class="errornote">
|
||||
{{ error }}
|
||||
</span>
|
||||
{% endfor %}
|
||||
|
||||
<form method="POST">
|
||||
{% csrf_token %}
|
||||
<input type="submit" value="Save">
|
||||
<br>
|
||||
<!-- No space between these elements. -->
|
||||
<div class="line-numbers"></div><textarea name="synonyms_text">{{ synonyms_text }}</textarea>
|
||||
</form>
|
||||
|
||||
<p>Note, those are line numbers, not ID numbers.</p>
|
||||
</section>
|
||||
|
||||
<script type="text/javascript">
|
||||
// jquery isn't loaded yet. lame.
|
||||
var textbox = document.querySelector('[name=synonyms_text]');
|
||||
var lineNums = document.querySelector('.line-numbers');
|
||||
function makeLineNumbers() {
|
||||
var numLines = textbox.value.match(/\n/g).length + 1;
|
||||
|
||||
var linesHtml = '';
|
||||
for (var i = 1; i <= numLines; i++) {
|
||||
linesHtml += i + '<br>';
|
||||
}
|
||||
lineNums.innerHTML = linesHtml;
|
||||
lineNums.scrollTop = textbox.scrollTop;
|
||||
}
|
||||
|
||||
textbox.addEventListener('change', makeLineNumbers);
|
||||
textbox.addEventListener('keyup', makeLineNumbers);
|
||||
textbox.addEventListener('scroll', function() {
|
||||
lineNums.scrollTop = textbox.scrollTop;
|
||||
});
|
||||
makeLineNumbers();
|
||||
</script>
|
||||
|
||||
{% endblock %}
|
|
@ -1,26 +0,0 @@
|
|||
from django.test.client import RequestFactory
|
||||
from django.test.utils import override_settings
|
||||
|
||||
import factory
|
||||
|
||||
from kitsune.search.models import Synonym
|
||||
from kitsune.sumo.tests import TestCase
|
||||
|
||||
|
||||
# Dummy request for passing to question_searcher() and brethren.
|
||||
dummy_request = RequestFactory().get("/")
|
||||
|
||||
|
||||
@override_settings(ES_LIVE_INDEXING=True)
|
||||
class ElasticTestCase(TestCase):
|
||||
"""Base class for Elastic Search tests, providing some conveniences"""
|
||||
|
||||
search_tests = True
|
||||
|
||||
|
||||
class SynonymFactory(factory.DjangoModelFactory):
|
||||
class Meta:
|
||||
model = Synonym
|
||||
|
||||
from_words = "foo, bar"
|
||||
to_words = "baz"
|
|
@ -1,276 +0,0 @@
|
|||
import json
|
||||
import time
|
||||
|
||||
from nose.tools import eq_
|
||||
from rest_framework.test import APIClient
|
||||
|
||||
from django.conf import settings
|
||||
|
||||
from kitsune.search.tests.test_es import ElasticTestCase
|
||||
from kitsune.sumo.urlresolvers import reverse
|
||||
from kitsune.questions.tests import QuestionFactory, AnswerFactory
|
||||
from kitsune.products.tests import ProductFactory
|
||||
from kitsune.wiki.tests import DocumentFactory, RevisionFactory
|
||||
|
||||
|
||||
class SuggestViewTests(ElasticTestCase):
|
||||
client_class = APIClient
|
||||
|
||||
# TODO: This should probably be a subclass of QuestionFactory
|
||||
def _make_question(self, solved=True, **kwargs):
|
||||
defaults = {
|
||||
"title": "Login to website comments disabled " + str(time.time()),
|
||||
"content": """
|
||||
readersupportednews.org, sends me emails with a list of
|
||||
articles to read.
|
||||
|
||||
The links to the articles work as normal, except that I
|
||||
cannot login from the linked article - as required - to
|
||||
send my comments.
|
||||
|
||||
I see a javascript activity statement at the bottom left
|
||||
corner of my screen while the left button is depressed
|
||||
on the Login button. it is gone when I release the left
|
||||
button, but no results.
|
||||
|
||||
I have the latest (7) version of java enabled, on an XP
|
||||
box.
|
||||
|
||||
Why this inability to login to this website commentary?
|
||||
""",
|
||||
}
|
||||
defaults.update(kwargs)
|
||||
q = QuestionFactory(**defaults)
|
||||
if solved:
|
||||
a = AnswerFactory(question=q)
|
||||
q.solution = a
|
||||
# Trigger a reindex for the question.
|
||||
q.save()
|
||||
return q
|
||||
|
||||
# TODO: This should probably be a subclass of DocumentFactory
|
||||
def _make_document(self, **kwargs):
|
||||
defaults = {
|
||||
"title": "How to make a pie from scratch with email " + str(time.time()),
|
||||
"category": 10,
|
||||
}
|
||||
|
||||
defaults.update(kwargs)
|
||||
d = DocumentFactory(**defaults)
|
||||
RevisionFactory(document=d, is_approved=True)
|
||||
d.save()
|
||||
return d
|
||||
|
||||
def test_invalid_product(self):
|
||||
res = self.client.get(reverse("search.suggest"), {"product": "nonexistant", "q": "search"})
|
||||
eq_(res.status_code, 400)
|
||||
eq_(res.data, {"product": ['Could not find product with slug "nonexistant".']})
|
||||
|
||||
def test_invalid_locale(self):
|
||||
res = self.client.get(reverse("search.suggest"), {"locale": "bad-medicine", "q": "search"})
|
||||
eq_(res.status_code, 400)
|
||||
eq_(res.data, {"locale": ['Could not find locale "bad-medicine".']})
|
||||
|
||||
def test_invalid_fallback_locale_none_case(self):
|
||||
# Test the locale -> locale case.
|
||||
non_none_locale_fallback_pairs = [
|
||||
(key, val)
|
||||
for key, val in sorted(settings.NON_SUPPORTED_LOCALES.items())
|
||||
if val is not None
|
||||
]
|
||||
locale, fallback = non_none_locale_fallback_pairs[0]
|
||||
|
||||
res = self.client.get(reverse("search.suggest"), {"locale": locale, "q": "search"})
|
||||
eq_(res.status_code, 400)
|
||||
error_message = '"{0}" is not supported, but has fallback locale "{1}".'.format(
|
||||
locale, fallback
|
||||
)
|
||||
eq_(res.data, {"locale": [error_message]})
|
||||
|
||||
def test_invalid_fallback_locale_non_none_case(self):
|
||||
# Test the locale -> None case which falls back to WIKI_DEFAULT_LANGUAGE.
|
||||
has_none_locale_fallback_pairs = [
|
||||
(key, val)
|
||||
for key, val in sorted(settings.NON_SUPPORTED_LOCALES.items())
|
||||
if val is None
|
||||
]
|
||||
locale, fallback = has_none_locale_fallback_pairs[0]
|
||||
|
||||
res = self.client.get(reverse("search.suggest"), {"locale": locale, "q": "search"})
|
||||
eq_(res.status_code, 400)
|
||||
error_message = '"{0}" is not supported, but has fallback locale "{1}".'.format(
|
||||
locale, settings.WIKI_DEFAULT_LANGUAGE
|
||||
)
|
||||
eq_(res.data, {"locale": [error_message]})
|
||||
|
||||
def test_invalid_numbers(self):
|
||||
res = self.client.get(
|
||||
reverse("search.suggest"),
|
||||
{
|
||||
"max_questions": "a",
|
||||
"max_documents": "b",
|
||||
"q": "search",
|
||||
},
|
||||
)
|
||||
eq_(res.status_code, 400)
|
||||
eq_(
|
||||
res.data,
|
||||
{
|
||||
"max_questions": ["A valid integer is required."],
|
||||
"max_documents": ["A valid integer is required."],
|
||||
},
|
||||
)
|
||||
|
||||
def test_q_required(self):
|
||||
res = self.client.get(reverse("search.suggest"))
|
||||
eq_(res.status_code, 400)
|
||||
eq_(res.data, {"q": ["This field is required."]})
|
||||
|
||||
def test_it_works(self):
|
||||
q1 = self._make_question()
|
||||
d1 = self._make_document()
|
||||
self.refresh()
|
||||
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
|
||||
eq_([q["id"] for q in req.data["questions"]], [q1.id])
|
||||
eq_([d["title"] for d in req.data["documents"]], [d1.title])
|
||||
|
||||
def test_filters_in_postdata(self):
|
||||
q1 = self._make_question()
|
||||
d1 = self._make_document()
|
||||
self.refresh()
|
||||
|
||||
data = json.dumps({"q": "emails"})
|
||||
# Note: Have to use .generic() because .get() will convert the
|
||||
# data into querystring params and then it's clownshoes all
|
||||
# the way down.
|
||||
req = self.client.generic(
|
||||
"GET", reverse("search.suggest"), data=data, content_type="application/json"
|
||||
)
|
||||
eq_(req.status_code, 200)
|
||||
eq_([q["id"] for q in req.data["questions"]], [q1.id])
|
||||
eq_([d["title"] for d in req.data["documents"]], [d1.title])
|
||||
|
||||
def test_both_querystring_and_body_raises_error(self):
|
||||
self._make_question()
|
||||
self._make_document()
|
||||
self.refresh()
|
||||
|
||||
data = json.dumps({"q": "emails"})
|
||||
# Note: Have to use .generic() because .get() will convert the
|
||||
# data into querystring params and then it's clownshoes all
|
||||
# the way down.
|
||||
req = self.client.generic(
|
||||
"GET",
|
||||
reverse("search.suggest") + "?max_documents=3",
|
||||
data=data,
|
||||
content_type="application/json",
|
||||
)
|
||||
eq_(req.status_code, 400)
|
||||
eq_(
|
||||
req.data,
|
||||
{"detail": "Put all parameters either in the querystring or the HTTP request body."},
|
||||
)
|
||||
|
||||
def test_questions_max_results_0(self):
|
||||
self._make_question()
|
||||
self.refresh()
|
||||
|
||||
# Make sure something matches the query first.
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
|
||||
eq_(len(req.data["questions"]), 1)
|
||||
|
||||
# If we specify "don't give me any" make sure we don't get any.
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails", "max_questions": "0"})
|
||||
eq_(len(req.data["questions"]), 0)
|
||||
|
||||
def test_questions_max_results_non_0(self):
|
||||
self._make_question()
|
||||
self._make_question()
|
||||
self._make_question()
|
||||
self._make_question()
|
||||
self._make_question()
|
||||
self.refresh()
|
||||
|
||||
# Make sure something matches the query first.
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
|
||||
eq_(len(req.data["questions"]), 5)
|
||||
|
||||
# Make sure we get only 3.
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails", "max_questions": "3"})
|
||||
eq_(len(req.data["questions"]), 3)
|
||||
|
||||
def test_documents_max_results_0(self):
|
||||
self._make_document()
|
||||
self.refresh()
|
||||
|
||||
# Make sure something matches the query first.
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
|
||||
eq_(len(req.data["documents"]), 1)
|
||||
|
||||
# If we specify "don't give me any" make sure we don't get any.
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails", "max_documents": "0"})
|
||||
eq_(len(req.data["documents"]), 0)
|
||||
|
||||
def test_documents_max_results_non_0(self):
|
||||
self._make_document()
|
||||
self._make_document()
|
||||
self._make_document()
|
||||
self._make_document()
|
||||
self._make_document()
|
||||
self.refresh()
|
||||
|
||||
# Make sure something matches the query first.
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
|
||||
eq_(len(req.data["documents"]), 5)
|
||||
|
||||
# Make sure we get only 3.
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails", "max_documents": "3"})
|
||||
eq_(len(req.data["documents"]), 3)
|
||||
|
||||
def test_product_filter_works(self):
|
||||
p1 = ProductFactory()
|
||||
p2 = ProductFactory()
|
||||
q1 = self._make_question(product=p1)
|
||||
self._make_question(product=p2)
|
||||
self.refresh()
|
||||
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails", "product": p1.slug})
|
||||
eq_([q["id"] for q in req.data["questions"]], [q1.id])
|
||||
|
||||
def test_locale_filter_works_for_questions(self):
|
||||
q1 = self._make_question(locale="fr")
|
||||
self._make_question(locale="en-US")
|
||||
self.refresh()
|
||||
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails", "locale": "fr"})
|
||||
eq_([q["id"] for q in req.data["questions"]], [q1.id])
|
||||
|
||||
def test_locale_filter_works_for_documents(self):
|
||||
d1 = self._make_document(slug="right-doc", locale="fr")
|
||||
self._make_document(slug="wrong-doc", locale="en-US")
|
||||
self.refresh()
|
||||
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails", "locale": "fr"})
|
||||
eq_([d["slug"] for d in req.data["documents"]], [d1.slug])
|
||||
|
||||
def test_serializer_fields(self):
|
||||
"""Test that fields from the serializer are included."""
|
||||
self._make_question()
|
||||
self.refresh()
|
||||
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
|
||||
# Check that a field that is only available from the DB is in the response.
|
||||
assert "metadata" in req.data["questions"][0]
|
||||
|
||||
def test_only_solved(self):
|
||||
"""Test that only solved questions are suggested."""
|
||||
q1 = self._make_question(solved=True)
|
||||
q2 = self._make_question(solved=False)
|
||||
self.refresh()
|
||||
|
||||
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
|
||||
ids = [q["id"] for q in req.data["questions"]]
|
||||
assert q1.id in ids
|
||||
assert q2.id not in ids
|
||||
eq_(len(ids), 1)
|
|
@ -1,58 +0,0 @@
|
|||
from django.core.management import call_command
|
||||
|
||||
from unittest import mock
|
||||
|
||||
from kitsune.products.tests import ProductFactory
|
||||
from kitsune.search import es_utils
|
||||
from kitsune.search.tests import ElasticTestCase
|
||||
from kitsune.search.utils import FakeLogger
|
||||
from kitsune.wiki.tests import DocumentFactory, RevisionFactory
|
||||
|
||||
|
||||
class ESCommandTests(ElasticTestCase):
|
||||
@mock.patch.object(FakeLogger, "_out")
|
||||
def test_search(self, _out):
|
||||
"""Test that es_search command doesn't fail"""
|
||||
call_command("essearch", "cupcakes")
|
||||
|
||||
p = ProductFactory(title="firefox", slug="desktop")
|
||||
doc = DocumentFactory(title="cupcakes rock", locale="en-US", category=10, products=[p])
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
call_command("essearch", "cupcakes")
|
||||
|
||||
@mock.patch.object(FakeLogger, "_out")
|
||||
def test_reindex(self, _out):
|
||||
p = ProductFactory(title="firefox", slug="desktop")
|
||||
doc = DocumentFactory(title="cupcakes rock", locale="en-US", category=10, products=[p])
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
call_command("esreindex")
|
||||
call_command("esreindex", "--percent=50")
|
||||
call_command("esreindex", "--seconds-ago=60")
|
||||
call_command("esreindex", "--criticalmass")
|
||||
call_command("esreindex", "--mapping_types=wiki_documents")
|
||||
call_command("esreindex", "--delete")
|
||||
|
||||
@mock.patch.object(FakeLogger, "_out")
|
||||
def test_status(self, _out):
|
||||
p = ProductFactory(title="firefox", slug="desktop")
|
||||
doc = DocumentFactory(title="cupcakes rock", locale="en-US", category=10, products=[p])
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
call_command("esstatus")
|
||||
|
||||
@mock.patch.object(FakeLogger, "_out")
|
||||
def test_delete(self, _out):
|
||||
# Note: The read indexes and the write indexes are the same in
|
||||
# the tests, so we only have to do this once.
|
||||
indexes = es_utils.all_read_indexes()
|
||||
indexes.append("cupcakerainbow_index")
|
||||
for index in indexes:
|
||||
call_command("esdelete", index, noinput=True)
|
|
@ -1,286 +0,0 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
import json
|
||||
import unittest
|
||||
|
||||
from django.contrib.sites.models import Site
|
||||
|
||||
from unittest import mock
|
||||
from nose.tools import eq_
|
||||
|
||||
from kitsune.questions.models import QuestionMappingType
|
||||
from kitsune.questions.tests import QuestionFactory, AnswerFactory, AnswerVoteFactory
|
||||
from kitsune.search import es_utils
|
||||
from kitsune.search.models import generate_tasks
|
||||
from kitsune.search.tests import ElasticTestCase
|
||||
from kitsune.sumo.urlresolvers import reverse
|
||||
from kitsune.wiki.models import DocumentMappingType
|
||||
from kitsune.wiki.tests import DocumentFactory, ApprovedRevisionFactory
|
||||
|
||||
|
||||
class ElasticSearchSuggestionsTests(ElasticTestCase):
|
||||
@mock.patch.object(Site.objects, "get_current")
|
||||
def test_invalid_suggestions(self, get_current):
|
||||
"""The suggestions API needs a query term."""
|
||||
get_current.return_value.domain = "testserver"
|
||||
response = self.client.get(reverse("search.suggestions", locale="en-US"))
|
||||
eq_(400, response.status_code)
|
||||
assert not response.content
|
||||
|
||||
@mock.patch.object(Site.objects, "get_current")
|
||||
def test_suggestions(self, get_current):
|
||||
"""Suggestions API is well-formatted."""
|
||||
get_current.return_value.domain = "testserver"
|
||||
|
||||
doc = DocumentFactory(title="doc1 audio", locale="en-US", is_archived=False)
|
||||
ApprovedRevisionFactory(document=doc, summary="audio", content="audio")
|
||||
|
||||
ques = QuestionFactory(title="q1 audio", tags=["desktop"])
|
||||
# ques.tags.add(u'desktop')
|
||||
ans = AnswerFactory(question=ques)
|
||||
AnswerVoteFactory(answer=ans, helpful=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
response = self.client.get(reverse("search.suggestions", locale="en-US"), {"q": "audio"})
|
||||
eq_(200, response.status_code)
|
||||
eq_("application/x-suggestions+json", response["content-type"])
|
||||
results = json.loads(response.content)
|
||||
|
||||
eq_("audio", results[0])
|
||||
eq_(2, len(results[1]))
|
||||
eq_(0, len(results[2]))
|
||||
eq_(2, len(results[3]))
|
||||
|
||||
|
||||
class TestUtils(ElasticTestCase):
|
||||
def test_get_documents(self):
|
||||
q = QuestionFactory()
|
||||
self.refresh()
|
||||
|
||||
docs = es_utils.get_documents(QuestionMappingType, [q.id])
|
||||
eq_(docs[0]["id"], q.id)
|
||||
|
||||
|
||||
class TestTasks(ElasticTestCase):
|
||||
@mock.patch.object(QuestionMappingType, "index")
|
||||
def test_tasks(self, index_fun):
|
||||
"""Tests to make sure tasks are added and run"""
|
||||
q = QuestionFactory()
|
||||
# Don't call self.refresh here since that calls generate_tasks().
|
||||
|
||||
eq_(index_fun.call_count, 0)
|
||||
|
||||
q.save()
|
||||
generate_tasks()
|
||||
|
||||
eq_(index_fun.call_count, 1)
|
||||
|
||||
@mock.patch.object(QuestionMappingType, "index")
|
||||
def test_tasks_squashed(self, index_fun):
|
||||
"""Tests to make sure tasks are squashed"""
|
||||
q = QuestionFactory()
|
||||
# Don't call self.refresh here since that calls generate_tasks().
|
||||
|
||||
eq_(index_fun.call_count, 0)
|
||||
|
||||
q.save()
|
||||
q.save()
|
||||
q.save()
|
||||
q.save()
|
||||
|
||||
eq_(index_fun.call_count, 0)
|
||||
|
||||
generate_tasks()
|
||||
|
||||
eq_(index_fun.call_count, 1)
|
||||
|
||||
|
||||
class TestMappings(unittest.TestCase):
|
||||
def test_mappings(self):
|
||||
# This is more of a linter than a test. If it passes, then
|
||||
# everything is fine. If it fails, then it means things are
|
||||
# not fine. Not fine? Yeah, it means that there are two fields
|
||||
# with the same name, but different types in the
|
||||
# mappings that share an index. That doesn't work in ES.
|
||||
|
||||
# Doing it as a test seemed like a good idea since
|
||||
# it's likely to catch epic problems, but isn't in the runtime
|
||||
# code.
|
||||
|
||||
# Verify mappings that share the same index don't conflict
|
||||
for index in es_utils.all_read_indexes():
|
||||
merged_mapping = {}
|
||||
|
||||
for cls_name, mapping in list(es_utils.get_mappings(index).items()):
|
||||
mapping = mapping["properties"]
|
||||
|
||||
for key, val in list(mapping.items()):
|
||||
if key not in merged_mapping:
|
||||
merged_mapping[key] = (val, [cls_name])
|
||||
continue
|
||||
|
||||
# FIXME - We're comparing two dicts here. This might
|
||||
# not work for non-trivial dicts.
|
||||
if merged_mapping[key][0] != val:
|
||||
raise es_utils.MappingMergeError(
|
||||
"%s key different for %s and %s"
|
||||
% (key, cls_name, merged_mapping[key][1])
|
||||
)
|
||||
|
||||
merged_mapping[key][1].append(cls_name)
|
||||
|
||||
# If we get here, then we're fine.
|
||||
|
||||
|
||||
class TestAnalyzers(ElasticTestCase):
|
||||
def setUp(self):
|
||||
super(TestAnalyzers, self).setUp()
|
||||
|
||||
self.locale_data = {
|
||||
"en-US": {
|
||||
"analyzer": "snowball-english",
|
||||
"content": "I have a cat.",
|
||||
},
|
||||
"es": {
|
||||
"analyzer": "snowball-spanish",
|
||||
"content": "Tieno un gato.",
|
||||
},
|
||||
"ar": {
|
||||
"analyzer": "arabic",
|
||||
"content": "لدي اثنين من القطط",
|
||||
},
|
||||
"he": {
|
||||
"analyzer": "standard",
|
||||
"content": "גאולוגיה היא אחד",
|
||||
},
|
||||
}
|
||||
|
||||
self.docs = {}
|
||||
for locale, data in list(self.locale_data.items()):
|
||||
d = DocumentFactory(locale=locale)
|
||||
ApprovedRevisionFactory(document=d, content=data["content"])
|
||||
self.locale_data[locale]["doc"] = d
|
||||
|
||||
self.refresh()
|
||||
|
||||
def test_analyzer_choices(self):
|
||||
"""Check that the indexer picked the right analyzer."""
|
||||
|
||||
ids = [d.id for d in list(self.docs.values())]
|
||||
docs = es_utils.get_documents(DocumentMappingType, ids)
|
||||
for doc in docs:
|
||||
locale = doc["locale"]
|
||||
eq_(doc["_analyzer"], self.locale_data[locale]["analyzer"])
|
||||
|
||||
def test_query_analyzer_upgrader(self):
|
||||
analyzer = "snowball-english-synonyms"
|
||||
before = {
|
||||
"document_title__match": "foo",
|
||||
"document_locale__match": "bar",
|
||||
"document_title__match_phrase": "baz",
|
||||
"document_locale__match_phrase": "qux",
|
||||
}
|
||||
expected = {
|
||||
"document_title__match_analyzer": ("foo", analyzer),
|
||||
"document_locale__match": "bar",
|
||||
"document_title__match_phrase_analyzer": ("baz", analyzer),
|
||||
"document_locale__match_phrase": "qux",
|
||||
}
|
||||
actual = es_utils.es_query_with_analyzer(before, "en-US")
|
||||
eq_(actual, expected)
|
||||
|
||||
def _check_locale_tokenization(self, locale, expected_tokens, p_tag=True):
|
||||
"""
|
||||
Check that a given locale's document was tokenized correctly.
|
||||
|
||||
* `locale` - The locale to check.
|
||||
* `expected_tokens` - An iterable of the tokens that should be
|
||||
found. If any tokens from this list are missing, or if any
|
||||
tokens not in this list are found, the check will fail.
|
||||
* `p_tag` - Default True. If True, an extra token will be added
|
||||
to `expected_tokens`: "p".
|
||||
|
||||
This is because our wiki parser wraps it's content in <p>
|
||||
tags and many analyzers will tokenize a string like
|
||||
'<p>Foo</p>' as ['p', 'foo'] (the HTML tag is included in
|
||||
the tokenization). So this will show up in the tokenization
|
||||
during this test. Not all the analyzers do this, which is
|
||||
why it can be turned off.
|
||||
|
||||
Why can't we fix the analyzers to strip out that HTML, and not
|
||||
generate spurious tokens? That could probably be done, but it
|
||||
probably isn't worth while because:
|
||||
|
||||
* ES will weight common words lower, thanks to it's TF-IDF
|
||||
algorithms, which judges words based on how often they
|
||||
appear in the entire corpus and in the document, so the p
|
||||
tokens will be largely ignored.
|
||||
* The pre-l10n search code did it this way, so it doesn't
|
||||
break search.
|
||||
* When implementing l10n search, I wanted to minimize the
|
||||
number of changes needed, and this seemed like an unneeded
|
||||
change.
|
||||
"""
|
||||
|
||||
search = es_utils.Sphilastic(DocumentMappingType)
|
||||
search = search.filter(document_locale=locale)
|
||||
facet_filter = search._process_filters([("document_locale", locale)])
|
||||
search = search.facet_raw(
|
||||
tokens={"terms": {"field": "document_content"}, "facet_filter": facet_filter}
|
||||
)
|
||||
facets = search.facet_counts()
|
||||
|
||||
expected = set(expected_tokens)
|
||||
if p_tag:
|
||||
# Since `expected` is a set, there is no problem adding this
|
||||
# twice, since duplicates will be ignored.
|
||||
expected.add("p")
|
||||
actual = set(t["term"] for t in facets["tokens"])
|
||||
eq_(actual, expected)
|
||||
|
||||
# These 4 languages were chosen for tokenization testing because
|
||||
# they represent the 4 kinds of languages we have: English, Snowball
|
||||
# supported languages, ES supported languages and languages with no
|
||||
# analyzer, which use the standard analyzer. There is another
|
||||
# possible case, which is a custom analyzer, but we don't have any
|
||||
# of those right now.
|
||||
|
||||
def test_english_tokenization(self):
|
||||
"""Test that English stemming and stop words work."""
|
||||
self._check_locale_tokenization("en-US", ["i", "have", "cat"])
|
||||
|
||||
def test_spanish_tokenization(self):
|
||||
"""Test that Spanish stemming and stop words work."""
|
||||
self._check_locale_tokenization("es", ["tien", "un", "gat"])
|
||||
|
||||
def test_arabic_tokenization(self):
|
||||
"""Test that Arabic stemming works.
|
||||
|
||||
I don't read Arabic, this is just what ES gave me when I asked
|
||||
it to analyze an Arabic text as Arabic. If someone who reads
|
||||
Arabic can improve this test, go for it!
|
||||
"""
|
||||
self._check_locale_tokenization("ar", ["لد", "اثن", "قطط"])
|
||||
|
||||
def test_herbrew_tokenization(self):
|
||||
"""Test that Hebrew uses the standard analyzer."""
|
||||
tokens = ["גאולוגיה", "היא", "אחד"]
|
||||
self._check_locale_tokenization("he", tokens)
|
||||
|
||||
|
||||
class TestGetAnalyzerForLocale(ElasticTestCase):
|
||||
def test_default(self):
|
||||
actual = es_utils.es_analyzer_for_locale("en-US")
|
||||
eq_("snowball-english", actual)
|
||||
|
||||
def test_without_synonyms(self):
|
||||
actual = es_utils.es_analyzer_for_locale("en-US", synonyms=False)
|
||||
eq_("snowball-english", actual)
|
||||
|
||||
def test_with_synonyms_right_locale(self):
|
||||
actual = es_utils.es_analyzer_for_locale("en-US", synonyms=True)
|
||||
eq_("snowball-english-synonyms", actual)
|
||||
|
||||
def test_with_synonyms_wrong_locale(self):
|
||||
actual = es_utils.es_analyzer_for_locale("es", synonyms=True)
|
||||
eq_("snowball-spanish", actual)
|
|
@ -13,7 +13,6 @@ class OpenSearchTestCase(TestCase):
|
|||
# FIXME: This is silly. The better test would be to parse out
|
||||
# the content and then go through and make sure all the urls
|
||||
# were correct.
|
||||
assert b"http://testserver/fr/search/suggestions" in response.content
|
||||
assert b"en-US" not in response.content
|
||||
|
||||
def test_plugin_expires_and_mimetype(self):
|
||||
|
|
|
@ -1,307 +0,0 @@
|
|||
import json
|
||||
|
||||
from django.conf import settings
|
||||
from django.utils.http import urlquote
|
||||
from nose.tools import eq_
|
||||
from pyquery import PyQuery as pq
|
||||
|
||||
from kitsune.forums.tests import PostFactory, ThreadFactory
|
||||
from kitsune.products.tests import ProductFactory
|
||||
from kitsune.questions.tests import AnswerFactory, AnswerVoteFactory, QuestionFactory
|
||||
from kitsune.search.tests.test_es import ElasticTestCase
|
||||
from kitsune.sumo.tests import LocalizingClient
|
||||
from kitsune.sumo.urlresolvers import reverse
|
||||
from kitsune.wiki.tests import ApprovedRevisionFactory, DocumentFactory, RevisionFactory
|
||||
|
||||
|
||||
class SimpleSearchTests(ElasticTestCase):
|
||||
client_class = LocalizingClient
|
||||
|
||||
def test_content(self):
|
||||
"""Ensure template is rendered with no errors for a common search"""
|
||||
response = self.client.get(reverse("search"), {"q": "audio"})
|
||||
eq_("text/html; charset=utf-8", response["Content-Type"])
|
||||
eq_(200, response.status_code)
|
||||
|
||||
def test_search_type_param(self):
|
||||
"""Ensure that invalid values for search type (a=)
|
||||
does not cause errors"""
|
||||
response = self.client.get(reverse("search"), {"a": "dontdie"})
|
||||
eq_("text/html; charset=utf-8", response["Content-Type"])
|
||||
eq_(200, response.status_code)
|
||||
|
||||
def test_headers(self):
|
||||
"""Verify caching headers of search forms and search results"""
|
||||
response = self.client.get(reverse("search"), {"q": "audio"})
|
||||
eq_("max-age=%s" % (settings.SEARCH_CACHE_PERIOD * 60), response["Cache-Control"])
|
||||
assert "Expires" in response
|
||||
response = self.client.get(reverse("search"))
|
||||
eq_("max-age=%s" % (settings.SEARCH_CACHE_PERIOD * 60), response["Cache-Control"])
|
||||
assert "Expires" in response
|
||||
|
||||
def test_json_format(self):
|
||||
"""JSON without callback should return application/json"""
|
||||
response = self.client.get(
|
||||
reverse("search"),
|
||||
{
|
||||
"q": "bookmarks",
|
||||
"format": "json",
|
||||
},
|
||||
)
|
||||
eq_(response["Content-Type"], "application/json")
|
||||
|
||||
def test_json_callback_validation(self):
|
||||
"""Various json callbacks -- validation"""
|
||||
response = self.client.get(
|
||||
reverse("search"),
|
||||
{
|
||||
"q": "bookmarks",
|
||||
"format": "json",
|
||||
"callback": "callback",
|
||||
},
|
||||
)
|
||||
eq_(response["Content-Type"], "application/x-javascript")
|
||||
eq_(response.status_code, 200)
|
||||
|
||||
def test_page_invalid(self):
|
||||
"""Ensure non-integer param doesn't throw exception."""
|
||||
doc = DocumentFactory(
|
||||
title="How to fix your audio", locale="en-US", category=10, tags="desktop"
|
||||
)
|
||||
ApprovedRevisionFactory(document=doc)
|
||||
|
||||
self.refresh()
|
||||
|
||||
response = self.client.get(
|
||||
reverse("search"), {"q": "audio", "format": "json", "page": "invalid"}
|
||||
)
|
||||
eq_(200, response.status_code)
|
||||
eq_(1, json.loads(response.content)["total"])
|
||||
|
||||
def test_clean_question_excerpt(self):
|
||||
"""Ensure we clean html out of question excerpts."""
|
||||
q = QuestionFactory(title="audio", content='<script>alert("hacked");</script>')
|
||||
a = AnswerFactory(question=q)
|
||||
AnswerVoteFactory(answer=a, helpful=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
response = self.client.get(reverse("search"), {"q": "audio"})
|
||||
eq_(200, response.status_code)
|
||||
|
||||
doc = pq(response.content)
|
||||
assert "script" not in doc("div.result").text()
|
||||
|
||||
def test_fallback_for_zero_results(self):
|
||||
"""If there are no results, fallback to a list of top articles."""
|
||||
firefox = ProductFactory(title="firefox", slug="desktop")
|
||||
doc = DocumentFactory(title="audio1", locale="en-US", category=10, products=[firefox])
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
doc = DocumentFactory(title="audio2", locale="en-US", category=10, products=[firefox])
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
# Verify there are no real results but 2 fallback results are rendered
|
||||
response = self.client.get(reverse("search"), {"q": "piranha"})
|
||||
eq_(200, response.status_code)
|
||||
|
||||
assert b"We couldn't find any results for" in response.content
|
||||
doc = pq(response.content)
|
||||
eq_(2, len(doc("#search-results .result")))
|
||||
|
||||
def test_meta_tags(self):
|
||||
"""Tests that the search results page has the right meta tags"""
|
||||
url_ = reverse("search")
|
||||
response = self.client.get(url_, {"q": "contribute"})
|
||||
|
||||
doc = pq(response.content)
|
||||
eq_(doc('meta[name="WT.oss"]')[0].attrib["content"], "contribute")
|
||||
eq_(doc('meta[name="WT.oss_r"]')[0].attrib["content"], "0")
|
||||
eq_(doc('meta[name="robots"]')[0].attrib["content"], "noindex")
|
||||
|
||||
def test_search_cookie(self):
|
||||
"""Set a cookie with the latest search term."""
|
||||
data = {"q": "pagap\xf3 banco"}
|
||||
cookie = settings.LAST_SEARCH_COOKIE
|
||||
response = self.client.get(reverse("search", locale="fr"), data)
|
||||
assert cookie in response.cookies
|
||||
eq_(urlquote(data["q"]), response.cookies[cookie].value)
|
||||
|
||||
def test_empty_pages(self):
|
||||
"""Tests requesting a page that has no results"""
|
||||
ques = QuestionFactory(title="audio")
|
||||
ques.tags.add("desktop")
|
||||
ans = AnswerFactory(question=ques, content="volume")
|
||||
AnswerVoteFactory(answer=ans, helpful=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
qs = {"q": "audio", "page": 81}
|
||||
response = self.client.get(reverse("search"), qs)
|
||||
eq_(200, response.status_code)
|
||||
|
||||
def test_include_questions(self):
|
||||
"""This tests whether doing a simple search returns
|
||||
question results.
|
||||
|
||||
Bug #709202.
|
||||
|
||||
"""
|
||||
# Create a question with an answer with an answervote that
|
||||
# marks the answer as helpful. The question should have the
|
||||
# "desktop" tag.
|
||||
p = ProductFactory(title="firefox", slug="desktop")
|
||||
ques = QuestionFactory(title="audio", product=p)
|
||||
ans = AnswerFactory(question=ques, content="volume")
|
||||
AnswerVoteFactory(answer=ans, helpful=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
# This is the search that you get when you start on the sumo
|
||||
# homepage and do a search from the box with two differences:
|
||||
# first, we do it in json since it's easier to deal with
|
||||
# testing-wise and second, we search for 'audio' since we have
|
||||
# data for that.
|
||||
response = self.client.get(reverse("search"), {"q": "audio", "format": "json"})
|
||||
|
||||
eq_(200, response.status_code)
|
||||
|
||||
content = json.loads(response.content)
|
||||
eq_(content["total"], 1)
|
||||
|
||||
# This is another search that picks up results based on the
|
||||
# answer_content. answer_content is in a string array, so
|
||||
# this makes sure that works.
|
||||
response = self.client.get(reverse("search"), {"q": "volume", "format": "json"})
|
||||
|
||||
eq_(200, response.status_code)
|
||||
|
||||
content = json.loads(response.content)
|
||||
eq_(content["total"], 1)
|
||||
|
||||
def test_include_wiki(self):
|
||||
"""This tests whether doing a simple search returns wiki document
|
||||
results.
|
||||
|
||||
Bug #709202.
|
||||
|
||||
"""
|
||||
doc = DocumentFactory(title="audio", locale="en-US", category=10)
|
||||
doc.products.add(ProductFactory(title="firefox", slug="desktop"))
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
# This is the search that you get when you start on the sumo
|
||||
# homepage and do a search from the box with two differences:
|
||||
# first, we do it in json since it's easier to deal with
|
||||
# testing-wise and second, we search for 'audio' since we have
|
||||
# data for that.
|
||||
response = self.client.get(reverse("search"), {"q": "audio", "format": "json"})
|
||||
|
||||
eq_(200, response.status_code)
|
||||
|
||||
content = json.loads(response.content)
|
||||
eq_(content["total"], 1)
|
||||
|
||||
def test_only_show_wiki_and_questions(self):
|
||||
"""Tests that the simple search doesn't show forums
|
||||
|
||||
This verifies that we're only showing documents of the type
|
||||
that should be shown and that the filters on model are working
|
||||
correctly.
|
||||
|
||||
Bug #767394
|
||||
|
||||
"""
|
||||
p = ProductFactory(slug="desktop")
|
||||
ques = QuestionFactory(title="audio", product=p)
|
||||
ans = AnswerFactory(question=ques, content="volume")
|
||||
AnswerVoteFactory(answer=ans, helpful=True)
|
||||
|
||||
doc = DocumentFactory(title="audio", locale="en-US", category=10)
|
||||
doc.products.add(p)
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
|
||||
thread1 = ThreadFactory(title="audio")
|
||||
PostFactory(thread=thread1)
|
||||
|
||||
self.refresh()
|
||||
|
||||
response = self.client.get(reverse("search"), {"q": "audio", "format": "json"})
|
||||
|
||||
eq_(200, response.status_code)
|
||||
|
||||
content = json.loads(response.content)
|
||||
eq_(content["total"], 2)
|
||||
|
||||
# Archive the article and question. They should no longer appear
|
||||
# in simple search results.
|
||||
ques.is_archived = True
|
||||
ques.save()
|
||||
doc.is_archived = True
|
||||
doc.save()
|
||||
|
||||
self.refresh()
|
||||
|
||||
response = self.client.get(reverse("search"), {"q": "audio", "format": "json"})
|
||||
|
||||
eq_(200, response.status_code)
|
||||
|
||||
content = json.loads(response.content)
|
||||
eq_(content["total"], 0)
|
||||
|
||||
def test_filter_by_product(self):
|
||||
desktop = ProductFactory(slug="desktop")
|
||||
mobile = ProductFactory(slug="mobile")
|
||||
ques = QuestionFactory(title="audio", product=desktop)
|
||||
ans = AnswerFactory(question=ques, content="volume")
|
||||
AnswerVoteFactory(answer=ans, helpful=True)
|
||||
|
||||
doc = DocumentFactory(title="audio", locale="en-US", category=10)
|
||||
doc.products.add(desktop)
|
||||
doc.products.add(mobile)
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
# There should be 2 results for desktop and 1 for mobile.
|
||||
response = self.client.get(
|
||||
reverse("search"), {"q": "audio", "format": "json", "product": "desktop"}
|
||||
)
|
||||
eq_(200, response.status_code)
|
||||
content = json.loads(response.content)
|
||||
eq_(content["total"], 2)
|
||||
|
||||
response = self.client.get(
|
||||
reverse("search"), {"q": "audio", "format": "json", "product": "mobile"}
|
||||
)
|
||||
eq_(200, response.status_code)
|
||||
content = json.loads(response.content)
|
||||
eq_(content["total"], 1)
|
||||
|
||||
def test_filter_by_doctype(self):
|
||||
desktop = ProductFactory(slug="desktop")
|
||||
ques = QuestionFactory(title="audio", product=desktop)
|
||||
ans = AnswerFactory(question=ques, content="volume")
|
||||
AnswerVoteFactory(answer=ans, helpful=True)
|
||||
|
||||
doc = DocumentFactory(title="audio", locale="en-US", category=10, products=[desktop])
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
|
||||
doc = DocumentFactory(title="audio too", locale="en-US", category=10, products=[desktop])
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
# There should be 2 results for kb (w=1) and 1 for questions (w=2).
|
||||
response = self.client.get(reverse("search"), {"q": "audio", "format": "json", "w": "1"})
|
||||
eq_(200, response.status_code)
|
||||
content = json.loads(response.content)
|
||||
eq_(content["total"], 2)
|
||||
|
||||
response = self.client.get(reverse("search"), {"q": "audio", "format": "json", "w": "2"})
|
||||
eq_(200, response.status_code)
|
||||
content = json.loads(response.content)
|
||||
eq_(content["total"], 1)
|
|
@ -1,55 +0,0 @@
|
|||
from nose.tools import ok_
|
||||
|
||||
from kitsune.search.forms import SimpleSearchForm
|
||||
from kitsune.search.search_utils import generate_simple_search
|
||||
from kitsune.sumo.tests import TestCase
|
||||
|
||||
|
||||
class SimpleSearchTests(TestCase):
|
||||
def test_language_en_us(self):
|
||||
form = SimpleSearchForm({"q": "foo"})
|
||||
ok_(form.is_valid())
|
||||
|
||||
s = generate_simple_search(form, "en-US", with_highlights=False)
|
||||
|
||||
# NB: Comparing bits of big trees is hard, so we serialize it
|
||||
# and look for strings.
|
||||
s_string = str(s.build_search())
|
||||
# Verify locale
|
||||
ok_("{'term': {'document_locale': 'en-US'}}" in s_string)
|
||||
# Verify en-US has the right synonym-enhanced analyzer
|
||||
ok_("'analyzer': 'snowball-english-synonyms'" in s_string)
|
||||
|
||||
def test_language_fr(self):
|
||||
form = SimpleSearchForm({"q": "foo"})
|
||||
ok_(form.is_valid())
|
||||
|
||||
s = generate_simple_search(form, "fr", with_highlights=False)
|
||||
|
||||
s_string = str(s.build_search())
|
||||
# Verify locale
|
||||
ok_("{'term': {'document_locale': 'fr'}}" in s_string)
|
||||
# Verify fr has right synonym-less analyzer
|
||||
ok_("'analyzer': 'snowball-french'" in s_string)
|
||||
|
||||
def test_language_zh_cn(self):
|
||||
form = SimpleSearchForm({"q": "foo"})
|
||||
ok_(form.is_valid())
|
||||
|
||||
s = generate_simple_search(form, "zh-CN", with_highlights=False)
|
||||
|
||||
s_string = str(s.build_search())
|
||||
# Verify locale
|
||||
ok_("{'term': {'document_locale': 'zh-CN'}}" in s_string)
|
||||
# Verify standard analyzer is used
|
||||
ok_("'analyzer': 'chinese'" in s_string)
|
||||
|
||||
def test_with_highlights(self):
|
||||
form = SimpleSearchForm({"q": "foo"})
|
||||
ok_(form.is_valid())
|
||||
|
||||
s = generate_simple_search(form, "en-US", with_highlights=True)
|
||||
ok_("highlight" in s.build_search())
|
||||
|
||||
s = generate_simple_search(form, "en-US", with_highlights=False)
|
||||
ok_("highlight" not in s.build_search())
|
|
@ -1,118 +0,0 @@
|
|||
from textwrap import dedent
|
||||
|
||||
from nose.tools import eq_
|
||||
from pyquery import PyQuery as pq
|
||||
|
||||
from kitsune.search import es_utils, synonym_utils
|
||||
from kitsune.search.tasks import update_synonyms_task
|
||||
from kitsune.search.tests import ElasticTestCase, SynonymFactory
|
||||
from kitsune.sumo.tests import LocalizingClient, TestCase
|
||||
from kitsune.sumo.urlresolvers import reverse
|
||||
from kitsune.wiki.tests import DocumentFactory, RevisionFactory
|
||||
|
||||
|
||||
class TestSynonymModel(TestCase):
|
||||
def test_serialize(self):
|
||||
syn = SynonymFactory(from_words="foo", to_words="bar")
|
||||
eq_("foo => bar", str(syn))
|
||||
|
||||
|
||||
class TestFilterGenerator(TestCase):
|
||||
def test_name(self):
|
||||
"""Test that the right name is returned."""
|
||||
name, _ = es_utils.es_get_synonym_filter("en-US")
|
||||
eq_(name, "synonyms-en-US")
|
||||
|
||||
def test_no_synonyms(self):
|
||||
"""Test that when there are no synonyms an alternate filter is made."""
|
||||
_, body = es_utils.es_get_synonym_filter("en-US")
|
||||
eq_(
|
||||
body,
|
||||
{
|
||||
"type": "synonym",
|
||||
"synonyms": ["firefox => firefox"],
|
||||
},
|
||||
)
|
||||
|
||||
def test_with_some_synonyms(self):
|
||||
SynonymFactory(from_words="foo", to_words="bar")
|
||||
SynonymFactory(from_words="baz", to_words="qux")
|
||||
|
||||
_, body = es_utils.es_get_synonym_filter("en-US")
|
||||
|
||||
expected = {
|
||||
"type": "synonym",
|
||||
"synonyms": [
|
||||
"foo => bar",
|
||||
"baz => qux",
|
||||
],
|
||||
}
|
||||
eq_(body, expected)
|
||||
|
||||
|
||||
class TestSynonymParser(TestCase):
|
||||
def testItWorks(self):
|
||||
synonym_text = dedent(
|
||||
"""
|
||||
one, two => apple, banana
|
||||
three => orange, grape
|
||||
four, five => jellybean
|
||||
"""
|
||||
)
|
||||
synonyms = {
|
||||
("one, two", "apple, banana"),
|
||||
("three", "orange, grape"),
|
||||
("four, five", "jellybean"),
|
||||
}
|
||||
eq_(synonyms, synonym_utils.parse_synonyms(synonym_text))
|
||||
|
||||
def testTooManyArrows(self):
|
||||
try:
|
||||
synonym_utils.parse_synonyms("foo => bar => baz")
|
||||
except synonym_utils.SynonymParseError as e:
|
||||
eq_(len(e.errors), 1)
|
||||
else:
|
||||
assert False, "Parser did not catch error as expected."
|
||||
|
||||
def testTooFewArrows(self):
|
||||
try:
|
||||
synonym_utils.parse_synonyms("foo, bar, baz")
|
||||
except synonym_utils.SynonymParseError as e:
|
||||
eq_(len(e.errors), 1)
|
||||
else:
|
||||
assert False, "Parser did not catch error as expected."
|
||||
|
||||
|
||||
class SearchViewWithSynonyms(ElasticTestCase):
|
||||
client_class = LocalizingClient
|
||||
|
||||
def test_synonyms_work_in_search_view(self):
|
||||
d1 = DocumentFactory(title="frob")
|
||||
d2 = DocumentFactory(title="glork")
|
||||
RevisionFactory(document=d1, is_approved=True)
|
||||
RevisionFactory(document=d2, is_approved=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
# First search without synonyms
|
||||
response = self.client.get(reverse("search"), {"q": "frob"})
|
||||
doc = pq(response.content)
|
||||
header = doc.find("#search-results h2").text().strip()
|
||||
eq_(header, "Found 1 result for frob for All Products")
|
||||
|
||||
# Now add a synonym.
|
||||
SynonymFactory(from_words="frob", to_words="frob, glork")
|
||||
update_synonyms_task()
|
||||
self.refresh()
|
||||
|
||||
# Forward search
|
||||
response = self.client.get(reverse("search"), {"q": "frob"})
|
||||
doc = pq(response.content)
|
||||
header = doc.find("#search-results h2").text().strip()
|
||||
eq_(header, "Found 2 results for frob for All Products")
|
||||
|
||||
# Reverse search
|
||||
response = self.client.get(reverse("search"), {"q": "glork"})
|
||||
doc = pq(response.content)
|
||||
header = doc.find("#search-results h2").text().strip()
|
||||
eq_(header, "Found 1 result for glork for All Products")
|
|
@ -1,36 +0,0 @@
|
|||
from nose.tools import eq_
|
||||
|
||||
from kitsune.search.utils import chunked, from_class_path, to_class_path
|
||||
from kitsune.sumo.tests import TestCase
|
||||
|
||||
|
||||
class ChunkedTests(TestCase):
|
||||
def test_chunked(self):
|
||||
# chunking nothing yields nothing.
|
||||
eq_(list(chunked([], 1)), [])
|
||||
|
||||
# chunking list where len(list) < n
|
||||
eq_(list(chunked([1], 10)), [(1,)])
|
||||
|
||||
# chunking a list where len(list) == n
|
||||
eq_(list(chunked([1, 2], 2)), [(1, 2)])
|
||||
|
||||
# chunking list where len(list) > n
|
||||
eq_(list(chunked([1, 2, 3, 4, 5], 2)), [(1, 2), (3, 4), (5,)])
|
||||
|
||||
|
||||
class FooBarClassOfAwesome(object):
|
||||
pass
|
||||
|
||||
|
||||
def test_from_class_path():
|
||||
eq_(
|
||||
from_class_path("kitsune.search.tests.test_utils:FooBarClassOfAwesome"),
|
||||
FooBarClassOfAwesome,
|
||||
)
|
||||
|
||||
|
||||
def test_to_class_path():
|
||||
eq_(
|
||||
to_class_path(FooBarClassOfAwesome), "kitsune.search.tests.test_utils:FooBarClassOfAwesome"
|
||||
)
|
|
@ -6,5 +6,4 @@ from kitsune.search.v2 import views as v2_views
|
|||
urlpatterns = [
|
||||
url(r"^$", v2_views.simple_search, name="search"),
|
||||
url(r"^/xml$", views.opensearch_plugin, name="search.plugin"),
|
||||
url(r"^/suggestions$", views.opensearch_suggestions, name="search.suggestions"),
|
||||
]
|
||||
|
|
|
@ -1,8 +0,0 @@
|
|||
from django.conf.urls import url
|
||||
|
||||
from kitsune.search import api
|
||||
|
||||
# API urls. Prefixed with /api/2/
|
||||
urlpatterns = [
|
||||
url("^search/suggest/$", api.suggest, name="search.suggest"),
|
||||
]
|
|
@ -1,43 +1,8 @@
|
|||
import time
|
||||
from itertools import islice
|
||||
|
||||
from django.conf import settings
|
||||
|
||||
import bleach
|
||||
|
||||
from kitsune.lib.sumo_locales import LOCALES
|
||||
|
||||
|
||||
class FakeLogger(object):
|
||||
"""Fake logger that we can pretend is a Python Logger
|
||||
|
||||
Why? Well, because Django has logging settings that prevent me
|
||||
from setting up a logger here that uses the stdout that the Django
|
||||
BaseCommand has. At some point p while fiddling with it, I
|
||||
figured, 'screw it--I'll just write my own' and did.
|
||||
|
||||
The minor ramification is that this isn't a complete
|
||||
implementation so if it's missing stuff, we'll have to add it.
|
||||
"""
|
||||
|
||||
def __init__(self, stdout):
|
||||
self.stdout = stdout
|
||||
|
||||
def _out(self, level, msg, *args):
|
||||
msg = msg % args
|
||||
self.stdout.write("%s %-8s: %s\n" % (time.strftime("%H:%M:%S"), level, msg))
|
||||
|
||||
def info(self, msg, *args):
|
||||
self._out("INFO", msg, *args)
|
||||
|
||||
def error(self, msg, *args):
|
||||
self._out("ERROR", msg, *args)
|
||||
|
||||
|
||||
def clean_excerpt(excerpt):
|
||||
return bleach.clean(excerpt, tags=["b", "i"])
|
||||
|
||||
|
||||
def locale_or_default(locale):
|
||||
"""Return `locale` or, if `locale` isn't a known locale, a default.
|
||||
|
||||
|
@ -47,57 +12,3 @@ def locale_or_default(locale):
|
|||
if locale not in LOCALES:
|
||||
locale = settings.LANGUAGE_CODE
|
||||
return locale
|
||||
|
||||
|
||||
def chunked(iterable, n):
|
||||
"""Returns chunks of n length of iterable
|
||||
|
||||
If len(iterable) % n != 0, then the last chunk will have length
|
||||
less than n.
|
||||
|
||||
Example:
|
||||
|
||||
>>> chunked([1, 2, 3, 4, 5], 2)
|
||||
[(1, 2), (3, 4), (5,)]
|
||||
|
||||
"""
|
||||
iterable = iter(iterable)
|
||||
while True:
|
||||
t = tuple(islice(iterable, n))
|
||||
if t:
|
||||
yield t
|
||||
else:
|
||||
return
|
||||
|
||||
|
||||
def to_class_path(cls):
|
||||
"""Returns class path for a class
|
||||
|
||||
Takes a class and returns the class path which is composed of the
|
||||
module plus the class name. This can be reversed later to get the
|
||||
class using ``from_class_path``.
|
||||
|
||||
:returns: string
|
||||
|
||||
>>> from kitsune.search.models import Record
|
||||
>>> to_class_path(Record)
|
||||
'kitsune.search.models:Record'
|
||||
|
||||
"""
|
||||
return ":".join([cls.__module__, cls.__name__])
|
||||
|
||||
|
||||
def from_class_path(cls_path):
|
||||
"""Returns the class
|
||||
|
||||
Takes a class path and returns the class for it.
|
||||
|
||||
:returns: varies
|
||||
|
||||
>>> from_class_path('kitsune.search.models:Record')
|
||||
<Record ...>
|
||||
|
||||
"""
|
||||
module_path, cls_name = cls_path.split(":")
|
||||
module = __import__(module_path, fromlist=[cls_name])
|
||||
return getattr(module, cls_name)
|
||||
|
|
|
@ -1,32 +1,15 @@
|
|||
import json
|
||||
import logging
|
||||
from datetime import datetime, timedelta
|
||||
from itertools import chain
|
||||
|
||||
import bleach
|
||||
import jinja2
|
||||
from django.http import HttpResponse, HttpResponseBadRequest
|
||||
from django.shortcuts import render_to_response
|
||||
from django.utils.html import escape
|
||||
from django.utils.translation import pgettext_lazy
|
||||
from django.utils.translation import ugettext as _
|
||||
from django.views.decorators.cache import cache_page
|
||||
from elasticutils.contrib.django import ES_EXCEPTIONS
|
||||
from elasticutils.utils import format_explanation
|
||||
|
||||
from kitsune import search as constants
|
||||
from kitsune.products.models import Product
|
||||
from kitsune.search.forms import SimpleSearchForm
|
||||
from kitsune.search.search_utils import generate_simple_search
|
||||
from kitsune.search.utils import clean_excerpt, locale_or_default
|
||||
from kitsune.wiki.facets import documents_for
|
||||
|
||||
log = logging.getLogger("k.search")
|
||||
|
||||
|
||||
EXCERPT_JOINER = pgettext_lazy("between search excerpts", "...")
|
||||
|
||||
|
||||
def cache_control(resp, cache_period):
|
||||
"""Inserts cache/expires headers"""
|
||||
resp["Cache-Control"] = "max-age=%s" % (cache_period * 60)
|
||||
|
@ -36,110 +19,6 @@ def cache_control(resp, cache_period):
|
|||
return resp
|
||||
|
||||
|
||||
def _es_down_template(request, *args, **kwargs):
|
||||
"""Returns the appropriate "Elasticsearch is down!" template"""
|
||||
return "search/down.html"
|
||||
|
||||
|
||||
class UnknownDocType(Exception):
|
||||
"""Signifies a doctype for which there's no handling"""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
def build_results_list(pages, is_json):
|
||||
"""Takes a paginated search and returns results List
|
||||
|
||||
Handles wiki documents, questions and contributor forum posts.
|
||||
|
||||
:arg pages: paginated S
|
||||
:arg is_json: whether or not this is generated results for json output
|
||||
|
||||
:returns: list of dicts
|
||||
|
||||
"""
|
||||
results = []
|
||||
for rank, doc in enumerate(pages, pages.start_index()):
|
||||
if doc["model"] == "wiki_document":
|
||||
summary = _build_es_excerpt(doc)
|
||||
if not summary:
|
||||
summary = doc["document_summary"]
|
||||
result = {"title": doc["document_title"], "type": "document"}
|
||||
|
||||
elif doc["model"] == "questions_question":
|
||||
summary = _build_es_excerpt(doc)
|
||||
if not summary:
|
||||
# We're excerpting only question_content, so if the query matched
|
||||
# question_title or question_answer_content, then there won't be any
|
||||
# question_content excerpts. In that case, just show the question--but
|
||||
# only the first 500 characters.
|
||||
summary = bleach.clean(doc["question_content"], strip=True)[:500]
|
||||
|
||||
result = {
|
||||
"title": doc["question_title"],
|
||||
"type": "question",
|
||||
"last_updated": datetime.fromtimestamp(doc["updated"]),
|
||||
"is_solved": doc["question_is_solved"],
|
||||
"num_answers": doc["question_num_answers"],
|
||||
"num_votes": doc["question_num_votes"],
|
||||
"num_votes_past_week": doc["question_num_votes_past_week"],
|
||||
}
|
||||
|
||||
elif doc["model"] == "forums_thread":
|
||||
summary = _build_es_excerpt(doc, first_only=True)
|
||||
result = {"title": doc["post_title"], "type": "thread"}
|
||||
|
||||
else:
|
||||
raise UnknownDocType("%s is an unknown doctype" % doc["model"])
|
||||
|
||||
result["url"] = doc["url"]
|
||||
if not is_json:
|
||||
result["object"] = doc
|
||||
result["search_summary"] = summary
|
||||
result["rank"] = rank
|
||||
result["score"] = doc.es_meta.score
|
||||
result["explanation"] = escape(format_explanation(doc.es_meta.explanation))
|
||||
result["id"] = doc["id"]
|
||||
results.append(result)
|
||||
|
||||
return results
|
||||
|
||||
|
||||
@cache_page(60 * 15) # 15 minutes.
|
||||
def opensearch_suggestions(request):
|
||||
"""A simple search view that returns OpenSearch suggestions."""
|
||||
content_type = "application/x-suggestions+json"
|
||||
search_form = SimpleSearchForm(request.GET, auto_id=False)
|
||||
if not search_form.is_valid():
|
||||
return HttpResponseBadRequest(content_type=content_type)
|
||||
|
||||
cleaned = search_form.cleaned_data
|
||||
language = locale_or_default(cleaned["language"] or request.LANGUAGE_CODE)
|
||||
searcher = generate_simple_search(search_form, language, with_highlights=False)
|
||||
searcher = searcher.values_dict("document_title", "question_title", "url")
|
||||
results = searcher[:10]
|
||||
|
||||
def urlize(r):
|
||||
return "%s://%s%s" % (
|
||||
"https" if request.is_secure() else "http",
|
||||
request.get_host(),
|
||||
r["url"][0],
|
||||
)
|
||||
|
||||
def titleize(r):
|
||||
# NB: Elasticsearch returns an array of strings as the value, so we mimic that and
|
||||
# then pull out the first (and only) string.
|
||||
return r.get("document_title", r.get("question_title", [_("No title")]))[0]
|
||||
|
||||
try:
|
||||
data = [cleaned["q"], [titleize(r) for r in results], [], [urlize(r) for r in results]]
|
||||
except ES_EXCEPTIONS:
|
||||
# If we have Elasticsearch problems, we just send back an empty set of results.
|
||||
data = []
|
||||
|
||||
return HttpResponse(json.dumps(data), content_type=content_type)
|
||||
|
||||
|
||||
@cache_page(60 * 60 * 168) # 1 week.
|
||||
def opensearch_plugin(request):
|
||||
"""Render an OpenSearch Plugin."""
|
||||
|
@ -156,33 +35,6 @@ def opensearch_plugin(request):
|
|||
)
|
||||
|
||||
|
||||
def _ternary_filter(ternary_value):
|
||||
"""Return a search query given a TERNARY_YES or TERNARY_NO.
|
||||
|
||||
Behavior for TERNARY_OFF is undefined.
|
||||
|
||||
"""
|
||||
return ternary_value == constants.TERNARY_YES
|
||||
|
||||
|
||||
def _build_es_excerpt(result, first_only=False):
|
||||
"""Return concatenated search excerpts.
|
||||
|
||||
:arg result: The result object from the queryset results
|
||||
:arg first_only: True if we should show only the first bit, False
|
||||
if we should show all bits
|
||||
|
||||
"""
|
||||
bits = [m.strip() for m in chain(*list(result.es_meta.highlight.values()))]
|
||||
|
||||
if first_only and bits:
|
||||
excerpt = bits[0]
|
||||
else:
|
||||
excerpt = EXCERPT_JOINER.join(bits)
|
||||
|
||||
return jinja2.Markup(clean_excerpt(excerpt))
|
||||
|
||||
|
||||
def _fallback_results(locale, product_slugs):
|
||||
"""Return the top 20 articles by votes for the given product(s)."""
|
||||
products = []
|
||||
|
|
|
@ -1,31 +1,26 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
import inspect
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
from functools import wraps
|
||||
from os import getenv
|
||||
from smtplib import SMTPRecipientsRefused
|
||||
import subprocess
|
||||
|
||||
import django_nose
|
||||
import factory.fuzzy
|
||||
from django.conf import settings
|
||||
from django.core.cache import cache
|
||||
from django.test import TestCase as OriginalTestCase
|
||||
from django.test.client import Client
|
||||
from django.test.utils import override_settings
|
||||
from django.utils.translation import trans_real
|
||||
|
||||
import django_nose
|
||||
import factory.fuzzy
|
||||
from elasticutils.contrib.django import get_es
|
||||
from nose.tools import eq_
|
||||
from pyquery import PyQuery
|
||||
from waffle.models import Flag
|
||||
|
||||
from kitsune.search import es_utils
|
||||
from kitsune.search.models import generate_tasks
|
||||
from kitsune.sumo.urlresolvers import reverse, split_path
|
||||
|
||||
|
||||
# We do this gooftastic thing because nose uses unittest.SkipTest in
|
||||
# Python 2.7 which doesn't work with the whole --no-skip thing.
|
||||
# TODO: CHeck this after the upgrade
|
||||
|
@ -76,63 +71,18 @@ class TestCase(OriginalTestCase):
|
|||
trans_real.activate(settings.LANGUAGE_CODE)
|
||||
super(TestCase, self)._pre_setup()
|
||||
|
||||
def reindex_and_refresh(self):
|
||||
"""Reindexes anything in the db"""
|
||||
from kitsune.search.es_utils import es_reindex_cmd
|
||||
|
||||
es_reindex_cmd()
|
||||
self.refresh(run_tasks=False)
|
||||
|
||||
def setup_indexes(self, empty=False, wait=True):
|
||||
"""(Re-)create write index"""
|
||||
from kitsune.search.es_utils import recreate_indexes
|
||||
|
||||
recreate_indexes()
|
||||
get_es().cluster.health(wait_for_status="yellow")
|
||||
|
||||
def teardown_indexes(self):
|
||||
"""Tear down write index"""
|
||||
for index in es_utils.all_write_indexes():
|
||||
es_utils.delete_index(index)
|
||||
|
||||
@classmethod
|
||||
def setUpClass(cls):
|
||||
super(TestCase, cls).setUpClass()
|
||||
|
||||
if not getattr(settings, "ES_URLS"):
|
||||
cls.skipme = True
|
||||
return
|
||||
|
||||
# try to connect to ES and if it fails, skip ElasticTestCases.
|
||||
if not get_es().ping():
|
||||
cls.skipme = True
|
||||
return
|
||||
|
||||
def setUp(self):
|
||||
if self.skipme:
|
||||
raise SkipTest
|
||||
|
||||
super(TestCase, self).setUp()
|
||||
self.setup_indexes()
|
||||
|
||||
def tearDown(self):
|
||||
super(TestCase, self).tearDown()
|
||||
self.teardown_indexes()
|
||||
|
||||
def refresh(self, run_tasks=True):
|
||||
es = get_es()
|
||||
|
||||
if run_tasks:
|
||||
# Any time we're doing a refresh, we're making sure that
|
||||
# the index is ready to be queried. Given that, it's
|
||||
# almost always the case that we want to run all the
|
||||
# generated tasks, then refresh.
|
||||
generate_tasks()
|
||||
|
||||
for index in es_utils.all_write_indexes():
|
||||
es.indices.refresh(index=index)
|
||||
|
||||
es.cluster.health(wait_for_status="yellow")
|
||||
|
||||
|
||||
def attrs_eq(received, **expected):
|
||||
|
|
|
@ -56,7 +56,6 @@ urlpatterns = [
|
|||
# v2 APIs
|
||||
url(r"^api/2/", include("kitsune.notifications.urls_api")),
|
||||
url(r"^api/2/", include("kitsune.questions.urls_api")),
|
||||
url(r"^api/2/", include("kitsune.search.urls_api")),
|
||||
url(r"^api/2/", include("kitsune.sumo.urls_api")),
|
||||
# These API urls include both v1 and v2 urls.
|
||||
url(r"^api/", include("kitsune.users.urls_api")),
|
||||
|
|
|
@ -1,12 +0,0 @@
|
|||
from django.core.management.base import BaseCommand
|
||||
|
||||
from kitsune.search.tasks import index_task
|
||||
from kitsune.search.utils import to_class_path
|
||||
from kitsune.wiki.models import DocumentMappingType
|
||||
|
||||
|
||||
class Command(BaseCommand):
|
||||
help = "Reindex wiki_document."
|
||||
|
||||
def handle(self, **options):
|
||||
index_task.delay(to_class_path(DocumentMappingType), DocumentMappingType.get_indexable())
|
|
@ -1,6 +1,5 @@
|
|||
import hashlib
|
||||
import logging
|
||||
import time
|
||||
from datetime import datetime, timedelta
|
||||
from urllib.parse import urlparse
|
||||
|
||||
|
@ -21,13 +20,6 @@ from tidings.models import NotificationsMixin
|
|||
|
||||
from kitsune.gallery.models import Image
|
||||
from kitsune.products.models import Product, Topic
|
||||
from kitsune.search.es_utils import UnindexMeBro, es_analyzer_for_locale
|
||||
from kitsune.search.models import (
|
||||
SearchMappingType,
|
||||
SearchMixin,
|
||||
register_for_indexing,
|
||||
register_mapping_type,
|
||||
)
|
||||
from kitsune.sumo.apps import ProgrammingError
|
||||
from kitsune.sumo.models import LocaleField, ModelBase
|
||||
from kitsune.sumo.urlresolvers import reverse, split_path
|
||||
|
@ -51,6 +43,7 @@ from kitsune.wiki.config import (
|
|||
from kitsune.wiki.permissions import DocumentPermissionMixin
|
||||
|
||||
log = logging.getLogger("k.wiki")
|
||||
MAX_REVISION_COMMENT_LENGTH = 255
|
||||
|
||||
|
||||
class TitleCollision(Exception):
|
||||
|
@ -65,9 +58,7 @@ class _NotDocumentView(Exception):
|
|||
"""A URL not pointing to the document view was passed to from_url()."""
|
||||
|
||||
|
||||
class Document(
|
||||
NotificationsMixin, ModelBase, BigVocabTaggableMixin, SearchMixin, DocumentPermissionMixin
|
||||
):
|
||||
class Document(NotificationsMixin, ModelBase, BigVocabTaggableMixin, DocumentPermissionMixin):
|
||||
"""A localized knowledgebase document, not revision-specific."""
|
||||
|
||||
title = models.CharField(max_length=255, db_index=True)
|
||||
|
@ -676,10 +667,6 @@ class Document(
|
|||
revision__document=self, created__gt=start, helpful=True
|
||||
).count()
|
||||
|
||||
@classmethod
|
||||
def get_mapping_type(cls):
|
||||
return DocumentMappingType
|
||||
|
||||
def parse_and_calculate_links(self):
|
||||
"""Calculate What Links Here data for links going out from this.
|
||||
|
||||
|
@ -736,158 +723,6 @@ class Document(
|
|||
cache.delete(doc_html_cache_key(self.locale, self.slug))
|
||||
|
||||
|
||||
@register_mapping_type
|
||||
class DocumentMappingType(SearchMappingType):
|
||||
seconds_ago_filter = "current_revision__created__gte"
|
||||
list_keys = ["topic", "product"]
|
||||
|
||||
@classmethod
|
||||
def get_model(cls):
|
||||
return Document
|
||||
|
||||
@classmethod
|
||||
def get_query_fields(cls):
|
||||
return ["document_title", "document_content", "document_summary", "document_keywords"]
|
||||
|
||||
@classmethod
|
||||
def get_localized_fields(cls):
|
||||
# This is the same list as `get_query_fields`, but it doesn't
|
||||
# have to be, which is why it is typed twice.
|
||||
return ["document_title", "document_content", "document_summary", "document_keywords"]
|
||||
|
||||
@classmethod
|
||||
def get_mapping(cls):
|
||||
return {
|
||||
"properties": {
|
||||
# General fields
|
||||
"id": {"type": "long"},
|
||||
"model": {"type": "string", "index": "not_analyzed"},
|
||||
"url": {"type": "string", "index": "not_analyzed"},
|
||||
"indexed_on": {"type": "integer"},
|
||||
"updated": {"type": "integer"},
|
||||
"product": {"type": "string", "index": "not_analyzed"},
|
||||
"topic": {"type": "string", "index": "not_analyzed"},
|
||||
# Document specific fields (locale aware)
|
||||
"document_title": {"type": "string", "analyzer": "snowball"},
|
||||
"document_keywords": {"type": "string", "analyzer": "snowball"},
|
||||
"document_content": {
|
||||
"type": "string",
|
||||
"store": "yes",
|
||||
"analyzer": "snowball",
|
||||
"term_vector": "with_positions_offsets",
|
||||
},
|
||||
"document_summary": {
|
||||
"type": "string",
|
||||
"store": "yes",
|
||||
"analyzer": "snowball",
|
||||
"term_vector": "with_positions_offsets",
|
||||
},
|
||||
# Document specific fields (locale naive)
|
||||
"document_locale": {"type": "string", "index": "not_analyzed"},
|
||||
"document_current_id": {"type": "integer"},
|
||||
"document_parent_id": {"type": "integer"},
|
||||
"document_category": {"type": "integer"},
|
||||
"document_slug": {"type": "string", "index": "not_analyzed"},
|
||||
"document_is_archived": {"type": "boolean"},
|
||||
"document_recent_helpful_votes": {"type": "integer"},
|
||||
"document_display_order": {"type": "integer"},
|
||||
}
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def extract_document(cls, obj_id, obj=None):
|
||||
if obj is None:
|
||||
model = cls.get_model()
|
||||
obj = model.objects.select_related("current_revision", "parent").get(pk=obj_id)
|
||||
|
||||
if obj.html.startswith(REDIRECT_HTML):
|
||||
# It's possible this document is indexed and was turned
|
||||
# into a redirect, so now we want to explicitly unindex
|
||||
# it. The way we do that is by throwing an exception
|
||||
# which gets handled by the indexing machinery.
|
||||
raise UnindexMeBro()
|
||||
|
||||
d = {}
|
||||
d["id"] = obj.id
|
||||
d["model"] = cls.get_mapping_type_name()
|
||||
d["url"] = obj.get_absolute_url()
|
||||
d["indexed_on"] = int(time.time())
|
||||
|
||||
d["topic"] = [t.slug for t in obj.get_topics()]
|
||||
d["product"] = [p.slug for p in obj.get_products()]
|
||||
|
||||
d["document_title"] = obj.title
|
||||
d["document_locale"] = obj.locale
|
||||
d["document_parent_id"] = obj.parent.id if obj.parent else None
|
||||
d["document_content"] = obj.html
|
||||
d["document_category"] = obj.category
|
||||
d["document_slug"] = obj.slug
|
||||
d["document_is_archived"] = obj.is_archived
|
||||
d["document_display_order"] = obj.original.display_order
|
||||
|
||||
d["document_summary"] = obj.summary
|
||||
if obj.current_revision is not None:
|
||||
d["document_keywords"] = obj.current_revision.keywords
|
||||
d["updated"] = int(time.mktime(obj.current_revision.created.timetuple()))
|
||||
d["document_current_id"] = obj.current_revision.id
|
||||
d["document_recent_helpful_votes"] = obj.recent_helpful_votes
|
||||
else:
|
||||
d["document_summary"] = None
|
||||
d["document_keywords"] = None
|
||||
d["updated"] = None
|
||||
d["document_current_id"] = None
|
||||
d["document_recent_helpful_votes"] = 0
|
||||
|
||||
# Don't query for helpful votes if the document doesn't have a current
|
||||
# revision, or is a template, or is a redirect, or is in Navigation
|
||||
# category (50).
|
||||
if (
|
||||
obj.current_revision
|
||||
and not obj.is_template
|
||||
and not obj.html.startswith(REDIRECT_HTML)
|
||||
and not obj.category == 50
|
||||
):
|
||||
d["document_recent_helpful_votes"] = obj.recent_helpful_votes
|
||||
else:
|
||||
d["document_recent_helpful_votes"] = 0
|
||||
|
||||
# Select a locale-appropriate default analyzer for all strings.
|
||||
d["_analyzer"] = es_analyzer_for_locale(obj.locale)
|
||||
|
||||
return d
|
||||
|
||||
@classmethod
|
||||
def get_indexable(cls, seconds_ago=0):
|
||||
# This function returns all the indexable things, but we
|
||||
# really need to handle the case where something was indexable
|
||||
# and isn't anymore. Given that, this returns everything that
|
||||
# has a revision.
|
||||
indexable = super(cls, cls).get_indexable(seconds_ago=seconds_ago)
|
||||
indexable = indexable.filter(current_revision__isnull=False)
|
||||
return indexable
|
||||
|
||||
@classmethod
|
||||
def index(cls, document, **kwargs):
|
||||
# If there are no revisions or the current revision is a
|
||||
# redirect, we want to remove it from the index.
|
||||
if document["document_current_id"] is None or document["document_content"].startswith(
|
||||
REDIRECT_HTML
|
||||
):
|
||||
|
||||
cls.unindex(document["id"], es=kwargs.get("es", None))
|
||||
return
|
||||
|
||||
super(cls, cls).index(document, **kwargs)
|
||||
|
||||
|
||||
register_for_indexing("wiki", Document)
|
||||
register_for_indexing("wiki", Document.topics.through, m2m=True)
|
||||
register_for_indexing("wiki", Document.products.through, m2m=True)
|
||||
|
||||
|
||||
MAX_REVISION_COMMENT_LENGTH = 255
|
||||
|
||||
|
||||
class AbstractRevision(models.Model):
|
||||
# **%(class)s** is being used because it will allow a unique reverse name for the field
|
||||
# like created_revisions and created_draftrevisions
|
||||
|
@ -904,7 +739,7 @@ class AbstractRevision(models.Model):
|
|||
abstract = True
|
||||
|
||||
|
||||
class Revision(ModelBase, SearchMixin, AbstractRevision):
|
||||
class Revision(ModelBase, AbstractRevision):
|
||||
"""A revision of a localized knowledgebase document"""
|
||||
|
||||
summary = models.TextField() # wiki markup
|
||||
|
@ -1135,12 +970,8 @@ class Revision(ModelBase, SearchMixin, AbstractRevision):
|
|||
except IndexError:
|
||||
return None
|
||||
|
||||
@classmethod
|
||||
def get_mapping_type(cls):
|
||||
return RevisionMetricsMappingType
|
||||
|
||||
|
||||
class DraftRevision(ModelBase, SearchMixin, AbstractRevision):
|
||||
class DraftRevision(ModelBase, AbstractRevision):
|
||||
based_on = models.ForeignKey(Revision, on_delete=models.CASCADE)
|
||||
content = models.TextField(blank=True)
|
||||
locale = LocaleField(blank=False, db_index=True)
|
||||
|
@ -1149,91 +980,6 @@ class DraftRevision(ModelBase, SearchMixin, AbstractRevision):
|
|||
title = models.CharField(max_length=255, blank=True)
|
||||
|
||||
|
||||
@register_mapping_type
|
||||
class RevisionMetricsMappingType(SearchMappingType):
|
||||
seconds_ago_filter = "created__gte"
|
||||
|
||||
@classmethod
|
||||
def get_model(cls):
|
||||
return Revision
|
||||
|
||||
@classmethod
|
||||
def get_index_group(cls):
|
||||
return "metrics"
|
||||
|
||||
@classmethod
|
||||
def get_mapping(cls):
|
||||
return {
|
||||
"properties": {
|
||||
"id": {"type": "long"},
|
||||
"model": {"type": "string", "index": "not_analyzed"},
|
||||
"url": {"type": "string", "index": "not_analyzed"},
|
||||
"indexed_on": {"type": "integer"},
|
||||
"created": {"type": "date"},
|
||||
"reviewed": {"type": "date"},
|
||||
"locale": {"type": "string", "index": "not_analyzed"},
|
||||
"product": {"type": "string", "index": "not_analyzed"},
|
||||
"is_approved": {"type": "boolean"},
|
||||
"creator_id": {"type": "long"},
|
||||
"reviewer_id": {"type": "long"},
|
||||
}
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def extract_document(cls, obj_id, obj=None):
|
||||
"""Extracts indexable attributes from an Answer."""
|
||||
fields = [
|
||||
"id",
|
||||
"created",
|
||||
"creator_id",
|
||||
"reviewed",
|
||||
"reviewer_id",
|
||||
"is_approved",
|
||||
"document_id",
|
||||
]
|
||||
composed_fields = ["document__locale", "document__slug"]
|
||||
all_fields = fields + composed_fields
|
||||
|
||||
if obj is None:
|
||||
model = cls.get_model()
|
||||
obj_dict = model.objects.values(*all_fields).get(pk=obj_id)
|
||||
else:
|
||||
obj_dict = dict([(field, getattr(obj, field)) for field in fields])
|
||||
obj_dict["document__locale"] = obj.document.locale
|
||||
obj_dict["document__slug"] = obj.document.slug
|
||||
|
||||
d = {}
|
||||
d["id"] = obj_dict["id"]
|
||||
d["model"] = cls.get_mapping_type_name()
|
||||
|
||||
# We do this because get_absolute_url is an instance method
|
||||
# and we don't want to create an instance because it's a DB
|
||||
# hit and expensive. So we do it by hand. get_absolute_url
|
||||
# doesn't change much, so this is probably ok.
|
||||
d["url"] = reverse(
|
||||
"wiki.revision",
|
||||
kwargs={"revision_id": obj_dict["id"], "document_slug": obj_dict["document__slug"]},
|
||||
)
|
||||
|
||||
d["indexed_on"] = int(time.time())
|
||||
|
||||
d["created"] = obj_dict["created"]
|
||||
d["reviewed"] = obj_dict["reviewed"]
|
||||
|
||||
d["locale"] = obj_dict["document__locale"]
|
||||
d["is_approved"] = obj_dict["is_approved"]
|
||||
d["creator_id"] = obj_dict["creator_id"]
|
||||
d["reviewer_id"] = obj_dict["reviewer_id"]
|
||||
|
||||
doc = Document.objects.get(id=obj_dict["document_id"])
|
||||
d["product"] = [p.slug for p in doc.get_products()]
|
||||
|
||||
return d
|
||||
|
||||
|
||||
register_for_indexing("revisions", Revision)
|
||||
|
||||
|
||||
class HelpfulVote(ModelBase):
|
||||
"""Helpful or Not Helpful vote on Revision."""
|
||||
|
||||
|
|
|
@ -1,195 +0,0 @@
|
|||
from datetime import datetime, timedelta
|
||||
|
||||
from nose.tools import eq_
|
||||
|
||||
from kitsune.products.tests import ProductFactory, TopicFactory
|
||||
from kitsune.search.tests.test_es import ElasticTestCase
|
||||
from kitsune.wiki.tests import (
|
||||
DocumentFactory,
|
||||
RevisionFactory,
|
||||
HelpfulVoteFactory,
|
||||
RedirectRevisionFactory,
|
||||
)
|
||||
from kitsune.wiki.models import DocumentMappingType, RevisionMetricsMappingType
|
||||
|
||||
|
||||
class DocumentUpdateTests(ElasticTestCase):
|
||||
def test_add_and_delete(self):
|
||||
"""Adding a doc should add it to the search index; deleting should
|
||||
delete it."""
|
||||
doc = DocumentFactory()
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().count(), 1)
|
||||
|
||||
doc.delete()
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().count(), 0)
|
||||
|
||||
def test_translations_get_parent_tags(self):
|
||||
t1 = TopicFactory(display_order=1)
|
||||
t2 = TopicFactory(display_order=2)
|
||||
p = ProductFactory()
|
||||
doc1 = DocumentFactory(title="Audio too loud", products=[p], topics=[t1, t2])
|
||||
RevisionFactory(document=doc1, is_approved=True)
|
||||
|
||||
doc2 = DocumentFactory(title="Audio too loud bork bork", parent=doc1, tags=["badtag"])
|
||||
RevisionFactory(document=doc2, is_approved=True)
|
||||
|
||||
# Verify the parent has the right tags.
|
||||
doc_dict = DocumentMappingType.extract_document(doc1.id)
|
||||
eq_(sorted(doc_dict["topic"]), sorted([t1.slug, t2.slug]))
|
||||
eq_(doc_dict["product"], [p.slug])
|
||||
|
||||
# Verify the translation has the parent's tags.
|
||||
doc_dict = DocumentMappingType.extract_document(doc2.id)
|
||||
eq_(sorted(doc_dict["topic"]), sorted([t1.slug, t2.slug]))
|
||||
eq_(doc_dict["product"], [p.slug])
|
||||
|
||||
def test_wiki_topics(self):
|
||||
"""Make sure that adding topics to a Document causes it to
|
||||
refresh the index.
|
||||
|
||||
"""
|
||||
t = TopicFactory(slug="hiphop")
|
||||
eq_(DocumentMappingType.search().filter(topic=t.slug).count(), 0)
|
||||
doc = DocumentFactory()
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().filter(topic=t.slug).count(), 0)
|
||||
doc.topics.add(t)
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().filter(topic=t.slug).count(), 1)
|
||||
doc.topics.clear()
|
||||
self.refresh()
|
||||
|
||||
# Make sure the document itself is still there and that we didn't
|
||||
# accidentally delete it through screwed up signal handling:
|
||||
eq_(DocumentMappingType.search().filter().count(), 1)
|
||||
|
||||
eq_(DocumentMappingType.search().filter(topic=t.slug).count(), 0)
|
||||
|
||||
def test_wiki_products(self):
|
||||
"""Make sure that adding products to a Document causes it to
|
||||
refresh the index.
|
||||
|
||||
"""
|
||||
p = ProductFactory(slug="desktop")
|
||||
eq_(DocumentMappingType.search().filter(product=p.slug).count(), 0)
|
||||
doc = DocumentFactory()
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().filter(product=p.slug).count(), 0)
|
||||
doc.products.add(p)
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().filter(product=p.slug).count(), 1)
|
||||
doc.products.remove(p)
|
||||
self.refresh()
|
||||
|
||||
# Make sure the document itself is still there and that we didn't
|
||||
# accidentally delete it through screwed up signal handling:
|
||||
eq_(DocumentMappingType.search().filter().count(), 1)
|
||||
|
||||
eq_(DocumentMappingType.search().filter(product=p.slug).count(), 0)
|
||||
|
||||
def test_wiki_no_revisions(self):
|
||||
"""Don't index documents without approved revisions"""
|
||||
# Create a document with no revisions and make sure the
|
||||
# document is not in the index.
|
||||
doc = DocumentFactory()
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().count(), 0)
|
||||
# Create a revision that's not approved and make sure the
|
||||
# document is still not in the index.
|
||||
RevisionFactory(document=doc, is_approved=False)
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().count(), 0)
|
||||
|
||||
def test_wiki_redirects(self):
|
||||
"""Make sure we don't index redirects"""
|
||||
# First create a revision that doesn't have a redirect and
|
||||
# make sure it's in the index.
|
||||
doc = DocumentFactory(title="wool hats")
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().query(document_title__match="wool").count(), 1)
|
||||
|
||||
# Now create a revision that is a redirect and make sure the
|
||||
# document is removed from the index.
|
||||
RedirectRevisionFactory(document=doc)
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().query(document_title__match="wool").count(), 0)
|
||||
|
||||
def test_wiki_keywords(self):
|
||||
"""Make sure updating keywords updates the index."""
|
||||
# Create a document with a revision with no keywords. It
|
||||
# shouldn't show up with a document_keywords term query for
|
||||
# 'wool' since it has no keywords.
|
||||
doc = DocumentFactory(title="wool hats")
|
||||
RevisionFactory(document=doc, is_approved=True)
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().query(document_keywords="wool").count(), 0)
|
||||
|
||||
RevisionFactory(document=doc, is_approved=True, keywords="wool")
|
||||
self.refresh()
|
||||
|
||||
eq_(DocumentMappingType.search().query(document_keywords="wool").count(), 1)
|
||||
|
||||
def test_recent_helpful_votes(self):
|
||||
"""Recent helpful votes are indexed properly."""
|
||||
# Create a document and verify it doesn't show up in a
|
||||
# query for recent_helpful_votes__gt=0.
|
||||
r = RevisionFactory(is_approved=True)
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().filter(document_recent_helpful_votes__gt=0).count(), 0)
|
||||
|
||||
# Add an unhelpful vote, it still shouldn't show up.
|
||||
HelpfulVoteFactory(revision=r, helpful=False)
|
||||
r.document.save() # Votes don't trigger a reindex.
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().filter(document_recent_helpful_votes__gt=0).count(), 0)
|
||||
|
||||
# Add an helpful vote created 31 days ago, it still shouldn't show up.
|
||||
created = datetime.now() - timedelta(days=31)
|
||||
HelpfulVoteFactory(revision=r, helpful=True, created=created)
|
||||
r.document.save() # Votes don't trigger a reindex.
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().filter(document_recent_helpful_votes__gt=0).count(), 0)
|
||||
|
||||
# Add an helpful vote created 29 days ago, it should show up now.
|
||||
created = datetime.now() - timedelta(days=29)
|
||||
HelpfulVoteFactory(revision=r, helpful=True, created=created)
|
||||
r.document.save() # Votes don't trigger a reindex.
|
||||
self.refresh()
|
||||
eq_(DocumentMappingType.search().filter(document_recent_helpful_votes__gt=0).count(), 1)
|
||||
|
||||
|
||||
class RevisionMetricsTests(ElasticTestCase):
|
||||
def test_add_and_delete(self):
|
||||
"""Adding a revision should add it to the index.
|
||||
|
||||
Deleting should delete it.
|
||||
"""
|
||||
r = RevisionFactory()
|
||||
self.refresh()
|
||||
eq_(RevisionMetricsMappingType.search().count(), 1)
|
||||
|
||||
r.delete()
|
||||
self.refresh()
|
||||
eq_(RevisionMetricsMappingType.search().count(), 0)
|
||||
|
||||
def test_data_in_index(self):
|
||||
"""Verify the data we are indexing."""
|
||||
p = ProductFactory()
|
||||
base_doc = DocumentFactory(locale="en-US", products=[p])
|
||||
d = DocumentFactory(locale="es", parent=base_doc)
|
||||
r = RevisionFactory(document=d, is_approved=True)
|
||||
|
||||
self.refresh()
|
||||
|
||||
eq_(RevisionMetricsMappingType.search().count(), 1)
|
||||
data = RevisionMetricsMappingType.search()[0]
|
||||
eq_(data["is_approved"], r.is_approved)
|
||||
eq_(data["locale"], d.locale)
|
||||
eq_(data["product"], [p.slug])
|
||||
eq_(data["creator_id"], r.creator_id)
|
Загрузка…
Ссылка в новой задаче