This commit is contained in:
Tasos Katsoulas 2021-05-26 17:36:32 +03:00
Родитель 422aba74f6
Коммит 3ebcbd5382
57 изменённых файлов: 71 добавлений и 5925 удалений

Просмотреть файл

@ -54,7 +54,7 @@ build-full: .docker-build-pull
touch .docker-build-full
pull: .env
-GIT_COMMIT_SHORT= ${DC} pull base base-dev staticfiles locales full-no-locales full mariadb elasticsearch redis
-GIT_COMMIT_SHORT= ${DC} pull base base-dev staticfiles locales full-no-locales full mariadb redis
touch .docker-build-pull
rebuild: clean build

Просмотреть файл

@ -11,7 +11,6 @@ services:
tty: true
depends_on:
- mariadb
- elasticsearch
- elasticsearch7
- kibana
- redis
@ -30,7 +29,6 @@ services:
user: ${UID:-kitsune}
depends_on:
- mariadb
- elasticsearch
- elasticsearch7
- redis
@ -48,7 +46,6 @@ services:
env_file: .env-test
depends_on:
- mariadb
- elasticsearch
- elasticsearch7
- redis
@ -140,12 +137,6 @@ services:
volumes:
- mysqlvolume:/var/lib/mysql
elasticsearch:
image: elasticsearch:2.4
ports:
- "9201:9200"
- "9301:9300"
elasticsearch7:
image: docker.elastic.co/elasticsearch/elasticsearch:7.10.2
environment:

Просмотреть файл

@ -27,8 +27,7 @@ Part 2: Developer's Guide
wsgi
email
localization
searchchapter
search-v2
search
frontend
notes

Просмотреть файл

Просмотреть файл

@ -1,354 +0,0 @@
.. _search-chapter:
======
Search
======
.. warning::
This section of documentation may be outdated.
See :any:`search-v2` for up-to-date (but partial) documentation.
Kitsune uses `Elasticsearch <https://www.elastic.co/>`_ to
power its on-site search facility.
It gives us a number of advantages over MySQL's full-text search or
Google's site search.
* Much faster than MySQL.
* And reduces load on MySQL.
* We have total control over what results look like.
* We can adjust searches with non-visible content.
* We don't rely on Google reindexing the site.
* We can fine-tune the algorithm and scoring.
Installing Elasticsearch
========================
There's an installation guide on the Elasticsearch site:
https://www.elastic.co/guide/en/elasticsearch/reference/1.3/setup-service.html
We're currently using `1.2.4 <https://www.elastic.co/downloads/past-releases/elasticsearch-1-2-4>`_
in production.
The directory you install Elasticsearch in will hereafter be referred
to as ``ELASTICDIR``.
You can configure Elasticsearch with the configuration file at
``ELASTICDIR/config/elasticsearch.yml``.
Elasticsearch uses several settings in ``kitsune/settings.py`` that you
need to override in ``kitsune/settings_local.py``. Here's an example::
# Connection information for Elastic
ES_URLS = ['http://127.0.0.1:9200']
ES_INDEXES = {'default': 'sumo_dev'}
ES_WRITE_INDEXES = ES_INDEXES
These settings explained:
``ES_URLS``
Defaults to ``['http://127.0.0.1:9200']``.
Points to the url for your Elasticsearch instance.
.. Warning::
The url must match the host and port in
``ELASTICDIR/config/elasticsearch.yml``. So if you change it in
one place, you must also change it in the other.
``ES_INDEXES``
Mapping of ``'default'`` to the name of the index used for
searching.
The index name must be prefixed with the value of
``ES_INDEX_PREFIX``.
Examples if ``ES_INDEX_PREFIX`` is set to ``'sumo'``::
ES_INDEXES = {'default': 'sumo'}
ES_INDEXES = {'default': 'sumo_20120213'}
ES_INDEXES = {'default': 'tofurkey'} # WRONG!
``ES_WRITE_INDEXES``
Mapping of ``'default'`` to the name of the index used for
indexing.
The index name must be prefixed with the value of
``ES_INDEX_PREFIX``.
Examples if ``ES_INDEX_PREFIX`` is set to ``'sumo'``::
ES_WRITE_INDEXES = ES_INDEXES
ES_WRITE_INDEXES = {'default': 'sumo'}
ES_WRITE_INDEXES = {'default': 'sumo_20120213'}
ES_WRITE_INDEXES = {'default': 'tofurkey'} # WRONG!
.. Note::
The separate roles for indexes allows us to push mapping
changes to production. In the first push, we'll push the
mapping change and give ``ES_WRITE_INDEXES`` a different
value. Then we reindex into the new index. Then we push a
change updating ``ES_INDEXES`` to equal ``ES_WRITE_INDEXES``
allowing the search code to use the new index.
If you're a developer, the best thing to do is have your
``ES_WRITE_INDEXES`` be the same as ``ES_INDEXES``. That way
you can reindex and search and you don't have to fiddle with
settings in between.
There are a few other settings you can set in your
``kitsune/settings_local.py`` file that override ElasticUtils defaults. See
`the ElasticUtils docs
<https://elasticutils.readthedocs.io/en/latest/django.html#configuration>`_
for details.
Other things you can change:
``ES_INDEX_PREFIX``
Defaults to ``'sumo'``.
All indexes for this application must start with the index
prefix. Indexes that don't start with the index prefix won't show
up in index listings and cannot be deleted through the esdelete
subcommand and the search admin.
.. Note::
The index names in both ``ES_INDEXES`` and ``ES_WRITE_INDEXES``
**must** start with this prefix.
``ES_LIVE_INDEXING``
Defaults to False.
You can also set ``ES_LIVE_INDEXING`` in your
``kitsune/settings_local.py`` file. This affects whether Kitsune does
Elasticsearch indexing when data changes in the ``post_save`` and
``pre_delete`` hooks.
For tests, ``ES_LIVE_INDEXING`` is set to ``False`` except for
Elasticsearch specific tests so we're not spending a ton of time
indexing things we're not using.
``ES_TIMEOUT``
Defaults to 5.
This affects timeouts for search-related requests.
If you're having problems with ES being slow, raising this number
might be helpful.
Using Elasticsearch
===================
Running
-------
Start Elasticsearch by::
$ ELASTICDIR/bin/elasticsearch
That launches Elasticsearch in the background.
Indexing
--------
Do a complete reindexing of everything by::
$ ./manage.py esreindex
This will delete the existing index specified by ``ES_WRITE_INDEXES``,
create a new one, and reindex everything in your database. On my
machine it takes under an hour.
If you need to get stuff done and don't want to wait for a full
indexing, you can index a percentage of things.
For example, this indexes 10% of your data ordered by id::
$ ./manage.py esreindex --percent 10
This indexes 50% of your data ordered by id::
$ ./manage.py esreindex --percent 50
I use this when I'm fiddling with mappings and the indexing code.
Another way of specifying a smaller number of things to index is by
indicating how recently updated things should be to be included::
$ ./manage.py esreindex --hours-ago 2
$ ./manage.py esreindex --minutes-ago 20
$ ./manage.py esreindex --seconds-ago 90
Those options can be combined as well if you wish. Different indexes have
different ways of determining how long ago something was updated, but as
a whole this should reindex everything in every index (or those specified
in the --mapping_types option) that was updated less than or equal to how
long ago you say.
You can also specify which mapping_types to index::
$ ./manage.py esreindex --mapping_types questions_question,wiki_document
See ``--help`` for more details::
$ ./manage.py esreindex --help
.. Note::
Once you've indexed everything, if you have ``ES_LIVE_INDEXING``
set to ``True``, you won't have to do it again unless indexing code
changes. The models have ``post_save`` and ``pre_delete`` hooks
that will update the index as the data changes.
.. Note::
If you kick off indexing with the admin, then indexing gets done in
chunks by celery tasks. If you need to halt indexing, you can purge
the tasks with::
$ celery -A kitsune purge
If you do this often, it helps to write a shell script for it.
Health/statistics
-----------------
You can see Elasticsearch index status with::
$ ./manage.py esstatus
This lists the indexes, tells you which ones are set to read and
write, and tells you how many documents are in the indexes by mapping
type.
Deleting indexes
----------------
You can use the search admin to delete the index.
On the command line, you can do::
$ ./manage.py esdelete <index-name>
Implementation details
----------------------
Kitsune uses `elasticutils <https://github.com/mozilla/elasticutils>`_
and `pyelasticsearch
<https://pyelasticsearch.readthedocs.io/en/latest/>`_.
Most of our code is in the ``search`` app in ``kitsune/search/``.
Models in Kitsune that are indexable use ``SearchMixin`` defined in
``models.py``.
Utility functions are implemented in ``es_utils.py``.
Sub commands for ``manage.py`` are implemented in
``management/commands/``.
Searching on the site
=====================
Scoring
-------
These are the default weights that apply to all searches:
wiki (aka kb)::
document_title__match 6
document_content__match 1
document_keywords__match 8
document_summary__match 2
questions (aka support forums)::
question_title__match 4
question_content__match 3
question_answer_content__match 3
forums (aka contributor forums)::
post_title__match 2
post_content__match 1
Elasticsearch is built on top of Lucene so the `Lucene documentation
on scoring
<http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html>`_
covers how a document is scored in regards to the search query and its
contents. The weights modify that---they're query-level boosts.
Additionally, `this blog post from 2006 <http://www.supermind.org/blog/378>`_
is really helpful in terms of providing insight on the implications of
the way things are scored.
Filters
-------
We use a series of filters on document_tag, question_tag, and other
properties of documents like `has_helpful`, `is_locked`, `is_archived`,
etc.
In ElasticSearch, filters remove items from the result set, but don't
affect the scoring.
We cannot apply weights to filtered fields.
Regular search
--------------
You could start a `regular` search from the front page or from the
search form on any article page.
Regular search does the following:
1. searches only kb and support forums
2. (filter) kb articles are tagged with the product (e.g. "desktop")
3. (filter) kb articles must not be archived
4. (filter) kb articles must be in Troubleshooting (10) and
How-to (20) categories
5. (filter) support forum posts tagged with the product
(e.g. "desktop")
6. (filter) support forum posts must have an answer marked as helpful
7. (filter) support forum posts must not be archived
It scores as specified above.
Ask A Question search
---------------------
An `Ask a question` or `AAQ` search is any search that is performed within
the AAQ workflow. The only difference to `regular` search is that `AAQ`
search shows forum posts that have no answer marked as helpful.

Просмотреть файл

@ -1,21 +1,21 @@
from nose.tools import eq_
from pyquery import PyQuery as pq
from kitsune.forums.tests import ThreadFactory
from kitsune.questions.tests import AnswerFactory
from kitsune.search.tests.test_es import ElasticTestCase
from kitsune.search.v2.tests import Elastic7TestCase
from kitsune.sumo.templatetags.jinja_helpers import urlparams
from kitsune.sumo.tests import LocalizingClient
from kitsune.sumo.urlresolvers import reverse
from kitsune.users.tests import UserFactory
from kitsune.wiki.tests import DocumentFactory, RevisionFactory, ApprovedRevisionFactory
from kitsune.wiki.tests import ApprovedRevisionFactory, DocumentFactory, RevisionFactory
class UserSearchTests(ElasticTestCase):
class UserSearchTests(Elastic7TestCase):
"""Tests for the Community Hub user search page."""
client_class = LocalizingClient
search_tests = True
def test_no_results(self):
UserFactory(username="foo", profile__name="Foo Bar")
@ -45,10 +45,11 @@ class UserSearchTests(ElasticTestCase):
eq_(len(doc(".results-user")), 2)
class LandingTests(ElasticTestCase):
class LandingTests(Elastic7TestCase):
"""Tests for the Community Hub landing page."""
client_class = LocalizingClient
search_tests = True
def test_top_contributors(self):
"""Verify the top contributors appear."""
@ -104,9 +105,11 @@ class LandingTests(ElasticTestCase):
assert "we are SUMO!" in doc("#recent-threads td").html()
class TopContributorsTests(ElasticTestCase):
class TopContributorsTests(Elastic7TestCase):
"""Tests for the Community Hub top contributors page."""
search_tests = True
client_class = LocalizingClient
def test_invalid_area(self):

Просмотреть файл

@ -1,4 +1,4 @@
from datetime import datetime, date, timedelta
from datetime import date, datetime, timedelta
from nose.tools import eq_
@ -9,13 +9,14 @@ from kitsune.community.utils import (
)
from kitsune.products.tests import ProductFactory
from kitsune.questions.tests import AnswerFactory
from kitsune.search.tests.test_es import ElasticTestCase
from kitsune.search.v2.tests import Elastic7TestCase
from kitsune.sumo.tests import LocalizingClient
from kitsune.wiki.tests import DocumentFactory, RevisionFactory
class TopContributorTests(ElasticTestCase):
class TopContributorTests(Elastic7TestCase):
client_class = LocalizingClient
search_tests = True
def test_top_contributors_kb(self):
d = DocumentFactory(locale="en-US")
@ -24,8 +25,6 @@ class TopContributorTests(ElasticTestCase):
RevisionFactory(document=d)
r4 = RevisionFactory(document=d, created=date.today() - timedelta(days=91))
self.refresh()
# By default, we should only get 2 top contributors back.
top, _ = top_contributors_kb()
eq_(2, len(top))

Просмотреть файл

@ -1,14 +1,15 @@
from nose.tools import eq_
from kitsune.search.v2.tests import Elastic7TestCase
from kitsune.sumo.tests import LocalizingClient
from kitsune.sumo.urlresolvers import reverse
from kitsune.search.tests import ElasticTestCase
class ContributorsMetricsTests(ElasticTestCase):
class ContributorsMetricsTests(Elastic7TestCase):
"""Tests for the Community Hub user search page."""
client_class = LocalizingClient
search_tests = True
def test_it_works(self):
url = reverse("community.metrics")

Просмотреть файл

@ -4,9 +4,6 @@ from datetime import datetime
from django.contrib.auth.models import User
from django.db import models
from kitsune.search.models import (
SearchMixin,
)
from kitsune.sumo.models import ModelBase
@ -58,7 +55,7 @@ class Tweet(ModelBase):
return tweet["text"]
class Reply(ModelBase, SearchMixin):
class Reply(ModelBase):
"""A reply from an AoA contributor.
The Tweet table gets truncated regularly so we can't use it for metrics.

Просмотреть файл

@ -1,27 +1,19 @@
import datetime
import time
from django.contrib.auth.models import User
from django.contrib.contenttypes.fields import GenericRelation
from django.core.exceptions import ObjectDoesNotExist
from django.db import models
from django.db.models import Q
from django.db.models.signals import pre_save
from django.contrib.contenttypes.fields import GenericRelation
from django.contrib.auth.models import User
from tidings.models import NotificationsMixin
from kitsune import forums
from kitsune.access.utils import has_perm, perm_is_defined_on
from kitsune.flagit.models import FlaggedObject
from kitsune.sumo.models import ModelBase
from kitsune.sumo.templatetags.jinja_helpers import urlparams, wiki_to_html
from kitsune.sumo.urlresolvers import reverse
from kitsune.sumo.models import ModelBase
from kitsune.search.models import (
SearchMappingType,
SearchMixin,
register_for_indexing,
register_mapping_type,
)
def _last_post_from(posts, exclude_post=None):
@ -108,7 +100,7 @@ class Forum(NotificationsMixin, ModelBase):
return [f for f in Forum.objects.all() if f.allows_viewing_by(user)]
class Thread(NotificationsMixin, ModelBase, SearchMixin):
class Thread(NotificationsMixin, ModelBase):
title = models.CharField(max_length=255)
forum = models.ForeignKey("Forum", on_delete=models.CASCADE)
created = models.DateTimeField(default=datetime.datetime.now, db_index=True)
@ -190,104 +182,6 @@ class Thread(NotificationsMixin, ModelBase, SearchMixin):
# If self.last_post is None, and this was called from Post.delete,
# then Post.delete will erase the thread, as well.
@classmethod
def get_mapping_type(cls):
return ThreadMappingType
@register_mapping_type
class ThreadMappingType(SearchMappingType):
seconds_ago_filter = "last_post__created__gte"
@classmethod
def search(cls):
return super(ThreadMappingType, cls).search().order_by("created")
@classmethod
def get_model(cls):
return Thread
@classmethod
def get_query_fields(cls):
return ["post_title", "post_content"]
@classmethod
def get_mapping(cls):
return {
"properties": {
"id": {"type": "long"},
"model": {"type": "string", "index": "not_analyzed"},
"url": {"type": "string", "index": "not_analyzed"},
"indexed_on": {"type": "integer"},
"created": {"type": "integer"},
"updated": {"type": "integer"},
"post_forum_id": {"type": "integer"},
"post_title": {"type": "string", "analyzer": "snowball"},
"post_is_sticky": {"type": "boolean"},
"post_is_locked": {"type": "boolean"},
"post_author_id": {"type": "integer"},
"post_author_ord": {"type": "string", "index": "not_analyzed"},
"post_content": {
"type": "string",
"analyzer": "snowball",
"store": "yes",
"term_vector": "with_positions_offsets",
},
"post_replies": {"type": "integer"},
}
}
@classmethod
def extract_document(cls, obj_id, obj=None):
"""Extracts interesting thing from a Thread and its Posts"""
if obj is None:
model = cls.get_model()
obj = model.objects.select_related("last_post").get(pk=obj_id)
d = {}
d["id"] = obj.id
d["model"] = cls.get_mapping_type_name()
d["url"] = obj.get_absolute_url()
d["indexed_on"] = int(time.time())
# TODO: Sphinx stores created and updated as seconds since the
# epoch, so we convert them to that format here so that the
# search view works correctly. When we ditch Sphinx, we should
# see if it's faster to filter on ints or whether we should
# switch them to dates.
d["created"] = int(time.mktime(obj.created.timetuple()))
if obj.last_post is not None:
d["updated"] = int(time.mktime(obj.last_post.created.timetuple()))
else:
d["updated"] = None
d["post_forum_id"] = obj.forum.id
d["post_title"] = obj.title
d["post_is_sticky"] = obj.is_sticky
d["post_is_locked"] = obj.is_locked
d["post_replies"] = obj.replies
author_ids = set()
author_ords = set()
content = []
posts = Post.objects.filter(thread_id=obj.id).select_related("author")
for post in posts:
author_ids.add(post.author.id)
author_ords.add(post.author.username)
content.append(post.content)
d["post_author_id"] = list(author_ids)
d["post_author_ord"] = list(author_ords)
d["post_content"] = content
return d
register_for_indexing("forums", Thread)
class Post(ModelBase):
thread = models.ForeignKey("Thread", on_delete=models.CASCADE)
@ -368,9 +262,6 @@ class Post(ModelBase):
return wiki_to_html(self.content)
register_for_indexing("forums", Post, instance_to_indexee=lambda p: p.thread)
def user_pre_save(sender, instance, **kw):
"""When a user's username is changed, we must reindex the threads
they participated in.

Просмотреть файл

@ -1,56 +0,0 @@
from nose.tools import eq_
from kitsune.forums.models import ThreadMappingType
from kitsune.forums.tests import ThreadFactory, PostFactory
from kitsune.search.tests.test_es import ElasticTestCase
from kitsune.users.tests import UserFactory
class TestPostUpdate(ElasticTestCase):
def test_added(self):
# Nothing exists before the test starts
eq_(ThreadMappingType.search().count(), 0)
# Creating a new Thread does create a new document in the index.
new_thread = ThreadFactory()
self.refresh()
eq_(ThreadMappingType.search().count(), 1)
# Saving a new post in a thread doesn't create a new
# document in the index. Therefore, the count remains 1.
#
# TODO: This is ambiguous: it's not clear whether we correctly
# updated the document in the index or whether the post_save
# hook didn't kick off. Need a better test.
PostFactory(thread=new_thread)
self.refresh()
eq_(ThreadMappingType.search().count(), 1)
def test_deleted(self):
# Nothing exists before the test starts
eq_(ThreadMappingType.search().count(), 0)
# Creating a new Thread does create a new document in the index.
new_thread = ThreadFactory()
self.refresh()
eq_(ThreadMappingType.search().count(), 1)
# Deleting the thread deletes the document in the index.
new_thread.delete()
self.refresh()
eq_(ThreadMappingType.search().count(), 0)
def test_thread_is_reindexed_on_username_change(self):
search = ThreadMappingType.search()
u = UserFactory(username="dexter")
ThreadFactory(creator=u, title="Hello")
self.refresh()
eq_(search.query(post_title="hello")[0]["post_author_ord"], ["dexter"])
# Change the username and verify the index.
u.username = "walter"
u.save()
self.refresh()
eq_(search.query(post_title="hello")[0]["post_author_ord"], ["walter"])

Просмотреть файл

@ -2,11 +2,13 @@ from nose.tools import eq_
from pyquery import PyQuery as pq
from kitsune.products.tests import ProductFactory
from kitsune.search.tests.test_es import ElasticTestCase
from kitsune.search.v2.tests import Elastic7TestCase
from kitsune.sumo.urlresolvers import reverse
class HomeTestCase(ElasticTestCase):
class HomeTestCase(Elastic7TestCase):
search_tests = True
def test_home(self):
"""Verify that home page renders products."""

Просмотреть файл

@ -1,22 +1,19 @@
from django.conf import settings
from django.core.cache import cache
from nose.tools import eq_
from pyquery import PyQuery as pq
from kitsune.products.models import HOT_TOPIC_SLUG
from kitsune.products.tests import ProductFactory, TopicFactory
from kitsune.questions.models import QuestionLocale
from kitsune.search.tests.test_es import ElasticTestCase
from kitsune.search.v2.tests import Elastic7TestCase
from kitsune.sumo.urlresolvers import reverse
from kitsune.wiki.tests import (
DocumentFactory,
ApprovedRevisionFactory,
HelpfulVoteFactory,
)
from kitsune.wiki.tests import ApprovedRevisionFactory, DocumentFactory, HelpfulVoteFactory
class ProductViewsTestCase(ElasticTestCase):
class ProductViewsTestCase(Elastic7TestCase):
search_tests = True
def test_products(self):
"""Verify that /products page renders products."""
# Create some products.

Просмотреть файл

@ -1,15 +1,11 @@
import logging
import time
from datetime import datetime, timedelta
from django.conf import settings
from django.core.management.base import BaseCommand
from django.db import connection, transaction
from kitsune.questions.models import Question, QuestionMappingType, Answer
from kitsune.search.es_utils import ES_EXCEPTIONS, get_documents
from kitsune.search.tasks import index_task
from kitsune.search.utils import to_class_path
from kitsune.questions.models import Question, Answer
from kitsune.search.v2.es7_utils import index_objects_bulk
@ -56,38 +52,3 @@ class Command(BaseCommand):
)
index_objects_bulk.delay("QuestionDocument", q_ids)
index_objects_bulk.delay("AnswerDocument", answer_ids)
# elastic v2 code:
try:
# So... the first time this runs, it'll handle 160K
# questions or so which stresses everything. Thus we
# do it in chunks because otherwise this won't work.
#
# After we've done this for the first time, we can nix
# the chunking code.
from kitsune.search.utils import chunked
for chunk in chunked(q_ids, 100):
# Fetch all the documents we need to update.
es_docs = get_documents(QuestionMappingType, chunk)
log.info("Updating %d index documents", len(es_docs))
documents = []
# For each document, update the data and stick it
# back in the index.
for doc in es_docs:
doc["question_is_archived"] = True
doc["indexed_on"] = int(time.time())
documents.append(doc)
QuestionMappingType.bulk_index(documents)
except ES_EXCEPTIONS:
# Something happened with ES, so let's push index
# updating into an index_task which retries when it
# fails because of ES issues.
index_task.delay(to_class_path(QuestionMappingType), q_ids)

Просмотреть файл

@ -1,6 +1,5 @@
import logging
import re
import time
from datetime import date, datetime, timedelta
from urllib.parse import urlparse
@ -21,22 +20,13 @@ from django.urls import resolve
from django.utils.translation import pgettext, override as translation_override
from elasticsearch7 import ElasticsearchException
from product_details import product_details
from taggit.models import Tag, TaggedItem
from taggit.models import Tag
from kitsune.flagit.models import FlaggedObject
from kitsune.products.models import Product, Topic
from kitsune.questions import config
from kitsune.questions.managers import AnswerManager, QuestionLocaleManager, QuestionManager
from kitsune.questions.tasks import update_answer_pages, update_question_votes
from kitsune.search.es_utils import UnindexMeBro
from kitsune.search.models import (
SearchMappingType,
SearchMixin,
register_for_indexing,
register_mapping_type,
)
from kitsune.search.tasks import index_task
from kitsune.search.utils import to_class_path
from kitsune.sumo.models import LocaleField, ModelBase
from kitsune.sumo.templatetags.jinja_helpers import urlparams, wiki_to_html
from kitsune.sumo.urlresolvers import reverse, split_path
@ -57,7 +47,7 @@ class AlreadyTakenException(Exception):
pass
class Question(ModelBase, BigVocabTaggableMixin, SearchMixin):
class Question(ModelBase, BigVocabTaggableMixin):
"""A support question."""
title = models.CharField(max_length=255)
@ -369,10 +359,6 @@ class Question(ModelBase, BigVocabTaggableMixin, SearchMixin):
cache.add(cache_key, tags, settings.CACHE_MEDIUM_TIMEOUT)
return tags
@classmethod
def get_mapping_type(cls):
return QuestionMappingType
@classmethod
def get_serializer(cls, serializer_type="full"):
# Avoid circular import
@ -698,171 +684,6 @@ class Question(ModelBase, BigVocabTaggableMixin, SearchMixin):
return images
@register_mapping_type
class QuestionMappingType(SearchMappingType):
seconds_ago_filter = "updated__gte"
list_keys = [
"topic",
"product",
"question_tag",
"question_answer_content",
"question_answer_creator",
]
@classmethod
def get_model(cls):
return Question
@classmethod
def get_query_fields(cls):
return ["question_title", "question_content", "question_answer_content"]
@classmethod
def get_localized_fields(cls):
# This is the same list as `get_query_fields`, but it doesn't
# have to be, which is why it is typed twice.
return ["question_title", "question_content", "question_answer_content"]
@classmethod
def get_mapping(cls):
return {
"properties": {
"id": {"type": "long"},
"model": {"type": "string", "index": "not_analyzed"},
"url": {"type": "string", "index": "not_analyzed"},
"indexed_on": {"type": "integer"},
"created": {"type": "integer"},
"updated": {"type": "integer"},
"product": {"type": "string", "index": "not_analyzed"},
"topic": {"type": "string", "index": "not_analyzed"},
"question_title": {"type": "string", "analyzer": "snowball"},
"question_content": {
"type": "string",
"analyzer": "snowball",
# TODO: Stored because originally, this is the
# only field we were excerpting on. Standardize
# one way or the other.
"store": "yes",
"term_vector": "with_positions_offsets",
},
"question_answer_content": {"type": "string", "analyzer": "snowball"},
"question_num_answers": {"type": "integer"},
"question_is_solved": {"type": "boolean"},
"question_is_locked": {"type": "boolean"},
"question_is_archived": {"type": "boolean"},
"question_has_answers": {"type": "boolean"},
"question_has_helpful": {"type": "boolean"},
"question_creator": {"type": "string", "index": "not_analyzed"},
"question_answer_creator": {"type": "string", "index": "not_analyzed"},
"question_num_votes": {"type": "integer"},
"question_num_votes_past_week": {"type": "integer"},
"question_tag": {"type": "string", "index": "not_analyzed"},
"question_locale": {"type": "string", "index": "not_analyzed"},
}
}
@classmethod
def extract_document(cls, obj_id, obj=None):
"""Extracts indexable attributes from a Question and its answers."""
fields = [
"id",
"title",
"content",
"num_answers",
"solution_id",
"is_locked",
"is_archived",
"created",
"updated",
"num_votes_past_week",
"locale",
"product_id",
"topic_id",
"is_spam",
]
composed_fields = ["creator__username"]
all_fields = fields + composed_fields
if obj is None:
# Note: Need to keep this in sync with
# tasks.update_question_vote_chunk.
model = cls.get_model()
obj = model.objects.values(*all_fields).get(pk=obj_id)
else:
fixed_obj = dict([(field, getattr(obj, field)) for field in fields])
fixed_obj["creator__username"] = obj.creator.username
obj = fixed_obj
if obj["is_spam"]:
raise UnindexMeBro()
d = {}
d["id"] = obj["id"]
d["model"] = cls.get_mapping_type_name()
# We do this because get_absolute_url is an instance method
# and we don't want to create an instance because it's a DB
# hit and expensive. So we do it by hand. get_absolute_url
# doesn't change much, so this is probably ok.
d["url"] = reverse("questions.details", kwargs={"question_id": obj["id"]})
d["indexed_on"] = int(time.time())
d["created"] = int(time.mktime(obj["created"].timetuple()))
d["updated"] = int(time.mktime(obj["updated"].timetuple()))
topics = Topic.objects.filter(id=obj["topic_id"])
products = Product.objects.filter(id=obj["product_id"])
d["topic"] = [t.slug for t in topics]
d["product"] = [p.slug for p in products]
d["question_title"] = obj["title"]
d["question_content"] = obj["content"]
d["question_num_answers"] = obj["num_answers"]
d["question_is_solved"] = bool(obj["solution_id"])
d["question_is_locked"] = obj["is_locked"]
d["question_is_archived"] = obj["is_archived"]
d["question_has_answers"] = bool(obj["num_answers"])
d["question_creator"] = obj["creator__username"]
d["question_num_votes"] = QuestionVote.objects.filter(question=obj["id"]).count()
d["question_num_votes_past_week"] = obj["num_votes_past_week"]
d["question_tag"] = list(
TaggedItem.tags_for(Question, Question(pk=obj_id)).values_list("name", flat=True)
)
d["question_locale"] = obj["locale"]
answer_values = list(
Answer.objects.filter(question=obj_id, is_spam=False).values_list(
"content", "creator__username"
)
)
d["question_answer_content"] = [a[0] for a in answer_values]
d["question_answer_creator"] = list(set(a[1] for a in answer_values))
if not answer_values:
d["question_has_helpful"] = False
else:
d["question_has_helpful"] = (
Answer.objects.filter(question=obj_id).filter(votes__helpful=True).exists()
)
return d
register_for_indexing("questions", Question)
register_for_indexing(
"questions",
TaggedItem,
instance_to_indexee=(
lambda i: (i.content_object if isinstance(i.content_object, Question) else None)
),
)
class QuestionMetaData(ModelBase):
"""Metadata associated with a support question."""
@ -933,7 +754,7 @@ class QuestionLocale(ModelBase):
verbose_name = "AAQ enabled locale"
class Answer(ModelBase, SearchMixin):
class Answer(ModelBase):
"""An answer to a support question."""
question = models.ForeignKey("Question", on_delete=models.CASCADE, related_name="answers")
@ -1153,10 +974,6 @@ class Answer(ModelBase, SearchMixin):
cache.add(cache_key, images, settings.CACHE_MEDIUM_TIMEOUT)
return images
@classmethod
def get_mapping_type(cls):
return AnswerMetricsMappingType
@classmethod
def get_serializer(cls, serializer_type="full"):
# Avoid circular import
@ -1177,113 +994,6 @@ class Answer(ModelBase, SearchMixin):
self.save()
@register_mapping_type
class AnswerMetricsMappingType(SearchMappingType):
seconds_ago_filter = "updated__gte"
list_keys = ["product"]
@classmethod
def get_model(cls):
return Answer
@classmethod
def get_index_group(cls):
return "metrics"
@classmethod
def get_mapping(cls):
return {
"properties": {
"id": {"type": "long"},
"model": {"type": "string", "index": "not_analyzed"},
"url": {"type": "string", "index": "not_analyzed"},
"indexed_on": {"type": "integer"},
"created": {"type": "date"},
"locale": {"type": "string", "index": "not_analyzed"},
"product": {"type": "string", "index": "not_analyzed"},
"is_solution": {"type": "boolean"},
"creator_id": {"type": "long"},
"by_asker": {"type": "boolean"},
"helpful_count": {"type": "integer"},
"unhelpful_count": {"type": "integer"},
}
}
@classmethod
def extract_document(cls, obj_id, obj=None):
"""Extracts indexable attributes from an Answer."""
fields = ["id", "created", "creator_id", "question_id"]
composed_fields = [
"question__locale",
"question__solution_id",
"question__creator_id",
"question__product_id",
]
all_fields = fields + composed_fields
if obj is None:
model = cls.get_model()
obj_dict = model.objects.values(*all_fields).get(pk=obj_id)
else:
obj_dict = dict([(field, getattr(obj, field)) for field in fields])
obj_dict["question__locale"] = obj.question.locale
obj_dict["question__solution_id"] = obj.question.solution_id
obj_dict["question__creator_id"] = obj.question.creator_id
obj_dict["question__product_id"] = obj.question.product_id
d = {}
d["id"] = obj_dict["id"]
d["model"] = cls.get_mapping_type_name()
# We do this because get_absolute_url is an instance method
# and we don't want to create an instance because it's a DB
# hit and expensive. So we do it by hand. get_absolute_url
# doesn't change much, so this is probably ok.
url = reverse("questions.details", kwargs={"question_id": obj_dict["question_id"]})
d["url"] = urlparams(url, hash="answer-%s" % obj_dict["id"])
d["indexed_on"] = int(time.time())
d["created"] = obj_dict["created"]
d["locale"] = obj_dict["question__locale"]
d["is_solution"] = obj_dict["id"] == obj_dict["question__solution_id"]
d["creator_id"] = obj_dict["creator_id"]
d["by_asker"] = obj_dict["creator_id"] == obj_dict["question__creator_id"]
products = Product.objects.filter(id=obj_dict["question__product_id"])
d["product"] = [p.slug for p in products]
related_votes = AnswerVote.objects.filter(answer_id=obj_dict["id"])
d["helpful_count"] = related_votes.filter(helpful=True).count()
d["unhelpful_count"] = related_votes.filter(helpful=False).count()
return d
register_for_indexing("answers", Answer)
# This below is needed to update the is_solution field on the answer.
register_for_indexing("answers", Question, instance_to_indexee=(lambda i: i.solution))
register_for_indexing("questions", Answer, instance_to_indexee=lambda a: a.question)
# This below is needed to update the is_solution field on the answer.
def reindex_questions_answers(sender, instance, **kw):
"""When a question is saved, we need to reindex it's answers.
This is needed because the solution may have changed."""
if instance.id:
answer_ids = instance.answers.all().values_list("id", flat=True)
index_task.delay(to_class_path(AnswerMetricsMappingType), list(answer_ids))
post_save.connect(
reindex_questions_answers, sender=Question, dispatch_uid="questions_reindex_answers"
)
def user_pre_save(sender, instance, **kw):
"""When a user's username is changed, we must reindex the questions
they participated in.
@ -1319,9 +1029,6 @@ class QuestionVote(ModelBase):
VoteMetadata.objects.create(vote=self, key=key, value=value[:VOTE_METADATA_MAX_LENGTH])
register_for_indexing("questions", QuestionVote, instance_to_indexee=lambda v: v.question)
class AnswerVote(ModelBase):
"""Helpful or Not Helpful vote on Answer."""
@ -1337,13 +1044,6 @@ class AnswerVote(ModelBase):
VoteMetadata.objects.create(vote=self, key=key, value=value[:VOTE_METADATA_MAX_LENGTH])
# TODO: We only need to update the helpful bit. It's possible
# we could ignore all AnswerVotes that aren't helpful and if
# they're marked as helpful, then update the index. Look into
# this.
register_for_indexing("questions", AnswerVote, instance_to_indexee=lambda v: v.answer.question)
class VoteMetadata(ModelBase):
"""Metadata for question and answer votes."""

Просмотреть файл

@ -1,250 +0,0 @@
from datetime import datetime, timedelta
from nose.tools import eq_
from pyquery import PyQuery as pq
from kitsune.products.tests import ProductFactory
from kitsune.questions.models import QuestionMappingType, AnswerMetricsMappingType
from kitsune.questions.tests import (
QuestionFactory,
AnswerFactory,
AnswerVoteFactory,
QuestionVoteFactory,
)
from kitsune.search.tests.test_es import ElasticTestCase
from kitsune.sumo.tests import LocalizingClient
from kitsune.sumo.urlresolvers import reverse
from kitsune.users.models import Profile
from kitsune.users.tests import UserFactory
class QuestionUpdateTests(ElasticTestCase):
def test_added(self):
search = QuestionMappingType.search()
# Create a question--that adds one document to the index.
q = QuestionFactory(title="Does this test work?")
self.refresh()
eq_(search.count(), 1)
eq_(search.query(question_title__match="test").count(), 1)
# No answer exist, so none should be searchable.
eq_(search.query(question_answer_content__match="only").count(), 0)
# Create an answer for the question. It should be searchable now.
AnswerFactory(content="There's only one way to find out!", question=q)
self.refresh()
eq_(search.query(question_answer_content__match="only").count(), 1)
# Make sure that there's only one question document in the index--creating an answer
# should have updated the existing one.
eq_(search.count(), 1)
def test_question_no_answers_deleted(self):
search = QuestionMappingType.search()
q = QuestionFactory(title="Does this work?")
self.refresh()
eq_(search.query(question_title__match="work").count(), 1)
q.delete()
self.refresh()
eq_(search.query(question_title__match="work").count(), 0)
def test_question_one_answer_deleted(self):
search = QuestionMappingType.search()
q = QuestionFactory(title="are model makers the new pink?")
a = AnswerFactory(content="yes.", question=q)
self.refresh()
# Question and its answers are a single document--so the index count should be only 1.
eq_(search.query(question_title__match="pink").count(), 1)
# After deleting the answer, the question document should remain.
a.delete()
self.refresh()
eq_(search.query(question_title__match="pink").count(), 1)
# Delete the question and it should be removed from the index.
q.delete()
self.refresh()
eq_(search.query(question_title__match="pink").count(), 0)
def test_question_questionvote(self):
search = QuestionMappingType.search()
# Create a question and verify it doesn't show up in a
# query for num_votes__gt=0.
q = QuestionFactory(title="model makers will inherit the earth")
self.refresh()
eq_(search.filter(question_num_votes__gt=0).count(), 0)
# Add a QuestionVote--it should show up now.
QuestionVoteFactory(question=q)
self.refresh()
eq_(search.filter(question_num_votes__gt=0).count(), 1)
def test_questions_tags(self):
"""Make sure that adding tags to a Question causes it to
refresh the index.
"""
tag = "hiphop"
eq_(QuestionMappingType.search().filter(question_tag=tag).count(), 0)
q = QuestionFactory()
self.refresh()
eq_(QuestionMappingType.search().filter(question_tag=tag).count(), 0)
q.tags.add(tag)
self.refresh()
eq_(QuestionMappingType.search().filter(question_tag=tag).count(), 1)
q.tags.remove(tag)
self.refresh()
eq_(QuestionMappingType.search().filter(question_tag=tag).count(), 0)
def test_question_is_unindexed_on_creator_delete(self):
search = QuestionMappingType.search()
q = QuestionFactory(title="Does this work?")
self.refresh()
eq_(search.query(question_title__match="work").count(), 1)
q.creator.delete()
self.refresh()
eq_(search.query(question_title__match="work").count(), 0)
def test_question_is_reindexed_on_username_change(self):
search = QuestionMappingType.search()
u = UserFactory(username="dexter")
QuestionFactory(creator=u, title="Hello")
AnswerFactory(creator=u, content="I love you")
self.refresh()
eq_(search.query(question_title__match="hello")[0]["question_creator"], "dexter")
query = search.query(question_answer_content__match="love")
eq_(query[0]["question_answer_creator"], ["dexter"])
# Change the username and verify the index.
u.username = "walter"
u.save()
self.refresh()
eq_(search.query(question_title__match="hello")[0]["question_creator"], "walter")
query = search.query(question_answer_content__match="love")
eq_(query[0]["question_answer_creator"], ["walter"])
def test_question_spam_is_unindexed(self):
search = QuestionMappingType.search()
q = QuestionFactory(title="I am spam")
self.refresh()
eq_(search.query(question_title__match="spam").count(), 1)
q.is_spam = True
q.save()
self.refresh()
eq_(search.query(question_title__match="spam").count(), 0)
def test_answer_spam_is_unindexed(self):
search = QuestionMappingType.search()
a = AnswerFactory(content="I am spam")
self.refresh()
eq_(search.query(question_answer_content__match="spam").count(), 1)
a.is_spam = True
a.save()
self.refresh()
eq_(search.query(question_answer_content__match="spam").count(), 0)
class QuestionSearchTests(ElasticTestCase):
"""Tests about searching for questions"""
def test_case_insensitive_search(self):
"""Ensure the default searcher is case insensitive."""
q = QuestionFactory(title="lolrus", content="I am the lolrus.")
AnswerVoteFactory(answer__question=q)
self.refresh()
# This is an AND operation
result = QuestionMappingType.search().query(
question_title__match="LOLRUS", question_content__match="LOLRUS"
)
assert result.count() > 0
class AnswerMetricsTests(ElasticTestCase):
def test_add_and_delete(self):
"""Adding an answer should add it to the index.
Deleting should delete it.
"""
a = AnswerFactory()
self.refresh()
eq_(AnswerMetricsMappingType.search().count(), 1)
a.delete()
self.refresh()
eq_(AnswerMetricsMappingType.search().count(), 0)
def test_data_in_index(self):
"""Verify the data we are indexing."""
p = ProductFactory()
q = QuestionFactory(locale="pt-BR", product=p)
a = AnswerFactory(question=q)
self.refresh()
eq_(AnswerMetricsMappingType.search().count(), 1)
data = AnswerMetricsMappingType.search()[0]
eq_(data["locale"], q.locale)
eq_(data["product"], [p.slug])
eq_(data["creator_id"], a.creator_id)
eq_(data["is_solution"], False)
eq_(data["by_asker"], False)
# Mark as solution and verify
q.solution = a
q.save()
self.refresh()
data = AnswerMetricsMappingType.search()[0]
eq_(data["is_solution"], True)
# Make the answer creator to be the question creator and verify.
a.creator = q.creator
a.save()
self.refresh()
data = AnswerMetricsMappingType.search()[0]
eq_(data["by_asker"], True)
class SupportForumTopContributorsTests(ElasticTestCase):
client_class = LocalizingClient
def test_top_contributors(self):
# There should be no top contributors since there are no answers.
response = self.client.get(reverse("questions.list", args=["all"]))
eq_(200, response.status_code)
doc = pq(response.content)
eq_(0, len(doc("#top-contributors ol li")))
# Add an answer, we now have a top conributor.
a = AnswerFactory()
self.refresh()
response = self.client.get(reverse("questions.list", args=["all"]))
eq_(200, response.status_code)
doc = pq(response.content)
lis = doc("#top-contributors ol li")
eq_(1, len(lis))
eq_(Profile.objects.get(user=a.creator).display_name, lis[0].text)
# Make answer 91 days old. There should no be top contributors.
a.created = datetime.now() - timedelta(days=91)
a.save()
self.refresh()
response = self.client.get(reverse("questions.list", args=["all"]))
eq_(200, response.status_code)
doc = pq(response.content)
eq_(0, len(doc("#top-contributors ol li")))

Просмотреть файл

@ -30,7 +30,7 @@ from kitsune.questions.tests import (
TestCaseBase,
tags_eq,
)
from kitsune.search.tests.test_es import ElasticTestCase
from kitsune.search.v2.tests import Elastic7TestCase
from kitsune.sumo import googleanalytics
from kitsune.sumo.tests import TestCase
from kitsune.tags.tests import TagFactory
@ -506,7 +506,9 @@ class AddExistingTagTests(TestCaseBase):
add_existing_tag("nonexistent tag", self.untagged_question.tags)
class OldQuestionsArchiveTest(ElasticTestCase):
class OldQuestionsArchiveTest(Elastic7TestCase):
search_tests = True
def test_archive_old_questions(self):
last_updated = datetime.now() - timedelta(days=100)

Просмотреть файл

@ -3,13 +3,12 @@ import json
import random
from datetime import datetime, timedelta
from string import ascii_letters
from unittest import mock
from django.conf import settings
from django.contrib.auth.models import User
from django.core import mail
from django.core.cache import cache
from unittest import mock
from nose.tools import eq_
from pyquery import PyQuery as pq
from taggit.models import Tag
@ -26,7 +25,7 @@ from kitsune.questions.tests import (
tags_eq,
)
from kitsune.questions.views import NO_TAG, UNAPPROVED_TAG
from kitsune.search.tests import ElasticTestCase
from kitsune.search.v2.tests import Elastic7TestCase
from kitsune.sumo.templatetags.jinja_helpers import urlparams
from kitsune.sumo.tests import (
LocalizingClient,
@ -1454,7 +1453,9 @@ class ProductForumTemplateTestCase(TestCaseBase):
assert openbadges.title not in product_list_html
class RelatedThingsTestCase(ElasticTestCase):
class RelatedThingsTestCase(Elastic7TestCase):
search_tests = True
def setUp(self):
super(RelatedThingsTestCase, self).setUp()
self.question = QuestionFactory(
@ -1484,7 +1485,6 @@ class RelatedThingsTestCase(ElasticTestCase):
AnswerVoteFactory(answer=a3, helpful=True)
cache.clear()
self.refresh()
response = get(self.client, "questions.details", args=[self.question.id])
doc = pq(response.content)
@ -1502,7 +1502,6 @@ class RelatedThingsTestCase(ElasticTestCase):
d1.save()
cache.clear()
self.refresh()
response = get(self.client, "questions.details", args=[self.question.id])
doc = pq(response.content)

Просмотреть файл

@ -16,17 +16,15 @@ from kitsune.questions.tests import (
TestCaseBase,
)
from kitsune.questions.views import parse_troubleshooting
from kitsune.search.tests.test_es import ElasticTestCase
from kitsune.search.v2.tests import Elastic7TestCase
from kitsune.sumo.templatetags.jinja_helpers import urlparams
from kitsune.sumo.tests import LocalizingClient, eq_msg, get, template_used
from kitsune.sumo.urlresolvers import reverse
from kitsune.users.tests import UserFactory, add_permission
from kitsune.wiki.tests import DocumentFactory, RevisionFactory
# Note:
# Tests using the ElasticTestCase are not being run bc of this line: `-a '!search_tests'`
class AAQSearchTests(ElasticTestCase):
class AAQSearchTests(Elastic7TestCase):
search_tests = True
client_class = LocalizingClient
def test_bleaching(self):
@ -66,11 +64,6 @@ class AAQSearchTests(ElasticTestCase):
TopicFactory(title="Fix problems", slug="fix-problems", product=p)
q = QuestionFactory(product=p, title="CupcakesQuestion cupcakes")
d = DocumentFactory(title="CupcakesKB cupcakes", category=10)
d.products.add(p)
RevisionFactory(document=d, is_approved=True)
self.refresh()
url = urlparams(
@ -82,22 +75,17 @@ class AAQSearchTests(ElasticTestCase):
eq_(200, response.status_code)
assert b"CupcakesQuestion" in response.content
assert b"CupcakesKB" in response.content
# Verify that archived articles and questions aren't shown...
# Archive both and they shouldn't appear anymore.
q.is_archived = True
q.save()
d.is_archived = True
d.save()
self.refresh()
response = self.client.get(url, follow=True)
eq_(200, response.status_code)
assert b"CupcakesQuestion" not in response.content
assert b"CupcakesKB" not in response.content
def test_search_suggestion_questions_locale(self):
"""Verifies the right languages show up in search suggestions."""
@ -683,7 +671,8 @@ class TestRateLimiting(TestCaseBase):
eq_(4, Answer.objects.count())
class TestStats(ElasticTestCase):
class TestStats(Elastic7TestCase):
search_tests = True
client_class = LocalizingClient
def test_stats(self):

Просмотреть файл

@ -1,383 +0,0 @@
import logging
import time
from datetime import datetime
import requests
from django.conf import settings
from django.contrib import admin
from django.core.exceptions import PermissionDenied
from django.http import HttpResponseRedirect, Http404
from django.shortcuts import render
from kitsune.search import synonym_utils
from kitsune.search.es_utils import (
get_doctype_stats,
get_indexes,
delete_index,
ES_EXCEPTIONS,
get_indexable,
CHUNK_SIZE,
recreate_indexes,
write_index,
read_index,
all_read_indexes,
all_write_indexes,
)
from kitsune.search.models import Record, get_mapping_types, Synonym
from kitsune.search.tasks import index_chunk_task, update_synonyms_task
from kitsune.search.utils import chunked, to_class_path
log = logging.getLogger("k.es")
def handle_reset(request):
"""Resets records"""
for rec in Record.objects.outstanding():
rec.mark_fail("Cancelled.")
return HttpResponseRedirect(request.path)
class DeleteError(Exception):
pass
def create_batch_id():
"""Returns a batch_id"""
# TODO: This is silly, but it's a good enough way to distinguish
# between batches by looking at a Record. This is just over the
# number of seconds in a day.
return str(int(time.time()))[-6:]
def handle_delete(request):
"""Deletes an index"""
index_to_delete = request.POST.get("delete_index")
es_indexes = [name for (name, count) in get_indexes()]
# Rule 1: Has to start with the ES_INDEX_PREFIX.
if not index_to_delete.startswith(settings.ES_INDEX_PREFIX):
raise DeleteError('"%s" is not a valid index name.' % index_to_delete)
# Rule 2: Must be an existing index.
if index_to_delete not in es_indexes:
raise DeleteError('"%s" does not exist.' % index_to_delete)
# Rule 3: Don't delete the default read index.
# TODO: When the critical index exists, this should be "Don't
# delete the critical read index."
if index_to_delete == read_index("default"):
raise DeleteError('"%s" is the default read index.' % index_to_delete)
# The index is ok to delete
delete_index(index_to_delete)
return HttpResponseRedirect(request.path)
class ReindexError(Exception):
pass
def reindex(mapping_type_names):
"""Reindex all instances of a given mapping type with celery tasks
:arg mapping_type_names: list of mapping types to reindex
"""
outstanding = Record.objects.outstanding().count()
if outstanding > 0:
raise ReindexError("There are %s outstanding chunks." % outstanding)
batch_id = create_batch_id()
# Break up all the things we want to index into chunks. This
# chunkifies by class then by chunk size.
chunks = []
for cls, indexable in get_indexable(mapping_types=mapping_type_names):
chunks.extend((cls, chunk) for chunk in chunked(indexable, CHUNK_SIZE))
for cls, id_list in chunks:
index = cls.get_index()
chunk_name = "Indexing: %s %d -> %d" % (
cls.get_mapping_type_name(),
id_list[0],
id_list[-1],
)
rec = Record.objects.create(batch_id=batch_id, name=chunk_name)
index_chunk_task.delay(index, batch_id, rec.id, (to_class_path(cls), id_list))
def handle_recreate_index(request):
"""Deletes an index, recreates it, and reindexes it."""
groups = [
name.replace("check_", "")
for name in list(request.POST.keys())
if name.startswith("check_")
]
indexes = [write_index(group) for group in groups]
recreate_indexes(indexes=indexes)
mapping_types_names = [
mt.get_mapping_type_name() for mt in get_mapping_types() if mt.get_index_group() in groups
]
reindex(mapping_types_names)
return HttpResponseRedirect(request.path)
def handle_reindex(request):
"""Caculates and kicks off indexing tasks"""
mapping_type_names = [
name.replace("check_", "")
for name in list(request.POST.keys())
if name.startswith("check_")
]
reindex(mapping_type_names)
return HttpResponseRedirect(request.path)
def search(request):
"""Render the admin view containing search tools"""
if not request.user.has_perm("search.reindex"):
raise PermissionDenied
error_messages = []
stats = {}
if "reset" in request.POST:
try:
return handle_reset(request)
except ReindexError as e:
error_messages.append("Error: %s" % e.message)
if "reindex" in request.POST:
try:
return handle_reindex(request)
except ReindexError as e:
error_messages.append("Error: %s" % e.message)
if "recreate_index" in request.POST:
try:
return handle_recreate_index(request)
except ReindexError as e:
error_messages.append("Error: %s" % e.message)
if "delete_index" in request.POST:
try:
return handle_delete(request)
except DeleteError as e:
error_messages.append("Error: %s" % e.message)
except ES_EXCEPTIONS as e:
error_messages.append("Error: {0}".format(repr(e)))
stats = None
write_stats = None
es_deets = None
indexes = []
try:
# TODO: SUMO has a single ES_URL and that's the ZLB and does
# the balancing. If that ever changes and we have multiple
# ES_URLs, then this should get fixed.
es_deets = requests.get(settings.ES_URLS[0]).json()
except requests.exceptions.RequestException:
pass
stats = {}
for index in all_read_indexes():
try:
stats[index] = get_doctype_stats(index)
except ES_EXCEPTIONS:
stats[index] = None
write_stats = {}
for index in all_write_indexes():
try:
write_stats[index] = get_doctype_stats(index)
except ES_EXCEPTIONS:
write_stats[index] = None
try:
indexes = get_indexes()
indexes.sort(key=lambda m: m[0])
except ES_EXCEPTIONS as e:
error_messages.append("Error: {0}".format(repr(e)))
recent_records = Record.objects.all()[:100]
outstanding_records = Record.objects.outstanding()
index_groups = set(settings.ES_INDEXES.keys())
index_groups |= set(settings.ES_WRITE_INDEXES.keys())
index_group_data = [[group, read_index(group), write_index(group)] for group in index_groups]
return render(
request,
"admin/search_maintenance.html",
{
"title": "Search",
"es_deets": es_deets,
"doctype_stats": stats,
"doctype_write_stats": write_stats,
"indexes": indexes,
"index_groups": index_groups,
"index_group_data": index_group_data,
"read_indexes": all_read_indexes,
"write_indexes": all_write_indexes,
"error_messages": error_messages,
"recent_records": recent_records,
"outstanding_records": outstanding_records,
"now": datetime.now(),
"read_index": read_index,
"write_index": write_index,
},
)
admin.site.register_view(path="search-maintenance", view=search, name="Search - Index Maintenance")
def _fix_results(results):
"""Fixes up the S results for better templating
1. extract the results_dict from the DefaultMappingType
and returns that as a dict
2. turns datestamps into Python datetime objects
Note: This abuses ElasticUtils DefaultMappingType by using
the private _results_dict.
"""
results = [obj._results_dict for obj in results]
for obj in results:
# Convert datestamps (which are in seconds since epoch) to
# Python datetime objects.
for key in ("indexed_on", "created", "updated"):
if key in obj and not isinstance(obj[key], datetime):
obj[key] = datetime.fromtimestamp(int(obj[key]))
return results
def index_view(request):
requested_bucket = request.GET.get("bucket", "")
requested_id = request.GET.get("id", "")
last_20_by_bucket = None
data = None
bucket_to_model = dict([(cls.get_mapping_type_name(), cls) for cls in get_mapping_types()])
if requested_bucket and requested_id:
# Nix whitespace because I keep accidentally picking up spaces
# when I copy and paste.
requested_id = requested_id.strip()
# The user wants to see a specific item in the index, so we
# attempt to fetch it from the index and show that
# specifically.
if requested_bucket not in bucket_to_model:
raise Http404
cls = bucket_to_model[requested_bucket]
data = list(cls.search().filter(id=requested_id))
if not data:
raise Http404
data = _fix_results(data)[0]
else:
# Create a list of (class, list-of-dicts) showing us the most
# recently indexed items for each bucket. We only display the
# id, title and indexed_on fields, so only pull those back from
# ES.
last_20_by_bucket = [
(cls_name, _fix_results(cls.search().order_by("-indexed_on")[:20]))
for cls_name, cls in list(bucket_to_model.items())
]
return render(
request,
"admin/search_index.html",
{
"title": "Index Browsing",
"buckets": [cls_name for cls_name, cls in list(bucket_to_model.items())],
"last_20_by_bucket": last_20_by_bucket,
"requested_bucket": requested_bucket,
"requested_id": requested_id,
"requested_data": data,
},
)
admin.site.register_view(path="search-index", view=index_view, name="Search - Index Browsing")
class SynonymAdmin(admin.ModelAdmin):
list_display = ("id", "from_words", "to_words")
list_display_links = ("id",)
list_editable = ("from_words", "to_words")
ordering = ("from_words", "id")
admin.site.register(Synonym, SynonymAdmin)
def synonym_editor(request):
parse_errors = []
all_synonyms = Synonym.objects.all()
if "sync_synonyms" in request.POST:
# This is a task. Normally we would call tasks asyncronously, right?
# In this case, since it runs quickly and is in the admin interface,
# the advantage of it being run in the request/response cycle
# outweight the delay in responding. If this becomes a problem
# we should make a better UI and make this .delay() again.
update_synonyms_task()
return HttpResponseRedirect(request.path)
synonyms_text = request.POST.get("synonyms_text")
if synonyms_text is not None:
db_syns = set((s.from_words, s.to_words) for s in all_synonyms)
try:
post_syns = set(synonym_utils.parse_synonyms(synonyms_text))
except synonym_utils.SynonymParseError as e:
parse_errors = e.errors
else:
syns_to_add = post_syns - db_syns
syns_to_remove = db_syns - post_syns
for (from_words, to_words) in syns_to_remove:
# This uses .get() because I want it to blow up if
# there isn't exactly 1 matching synonym.
(Synonym.objects.get(from_words=from_words, to_words=to_words).delete())
for (from_words, to_words) in syns_to_add:
Synonym(from_words=from_words, to_words=to_words).save()
return HttpResponseRedirect(request.path)
# If synonyms_text is not None, it came from POST, and there were
# errors. It shouldn't be modified, so the error messages make sense.
if synonyms_text is None:
synonyms_text = "\n".join(str(s) for s in all_synonyms)
synonym_add_count, synonym_remove_count = synonym_utils.count_out_of_date()
return render(
request,
"admin/search_synonyms.html",
{
"synonyms_text": synonyms_text,
"errors": parse_errors,
"synonym_add_count": synonym_add_count,
"synonym_remove_count": synonym_remove_count,
},
)
admin.site.register_view(path="synonym_bulk", view=synonym_editor, name="Search - Synonym Editor")

Просмотреть файл

@ -1,163 +0,0 @@
from django.conf import settings
from elasticsearch import RequestsHttpConnection
from rest_framework import serializers
from rest_framework.decorators import api_view
from rest_framework.response import Response
from kitsune.products.models import Product
from kitsune.questions.models import Question, QuestionMappingType
from kitsune.questions.api import QuestionSerializer
from kitsune.search import es_utils
from kitsune.sumo.api_utils import GenericAPIException
from kitsune.wiki.api import DocumentDetailSerializer
from kitsune.wiki.models import Document, DocumentMappingType
def positive_integer(value):
if value < 0:
raise serializers.ValidationError("This field must be positive.")
def valid_product(value):
if not value:
return
if not Product.objects.filter(slug=value).exists():
raise serializers.ValidationError('Could not find product with slug "{0}".'.format(value))
def valid_locale(value):
if not value:
return
if value not in settings.SUMO_LANGUAGES:
if value in settings.NON_SUPPORTED_LOCALES:
fallback = settings.NON_SUPPORTED_LOCALES[value] or settings.WIKI_DEFAULT_LANGUAGE
raise serializers.ValidationError(
'"{0}" is not supported, but has fallback locale "{1}".'.format(value, fallback)
)
else:
raise serializers.ValidationError('Could not find locale "{0}".'.format(value))
class SuggestSerializer(serializers.Serializer):
q = serializers.CharField(required=True)
locale = serializers.CharField(
required=False, default=settings.WIKI_DEFAULT_LANGUAGE, validators=[valid_locale]
)
product = serializers.CharField(required=False, default="", validators=[valid_product])
max_questions = serializers.IntegerField(
required=False, default=10, validators=[positive_integer]
)
max_documents = serializers.IntegerField(
required=False, default=10, validators=[positive_integer]
)
@api_view(["GET", "POST"])
def suggest(request):
if request.data and request.GET:
raise GenericAPIException(
400, "Put all parameters either in the querystring or the HTTP request body."
)
serializer = SuggestSerializer(data=(request.data or request.GET))
if not serializer.is_valid():
raise GenericAPIException(400, serializer.errors)
searcher = (
es_utils.AnalyzerS()
.es(
urls=settings.ES_URLS,
timeout=settings.ES_TIMEOUT,
use_ssl=settings.ES_USE_SSL,
http_auth=settings.ES_HTTP_AUTH,
connection_class=RequestsHttpConnection,
)
.indexes(es_utils.read_index("default"))
)
data = serializer.validated_data
return Response(
{
"questions": _question_suggestions(
searcher, data["q"], data["locale"], data["product"], data["max_questions"]
),
"documents": _document_suggestions(
searcher, data["q"], data["locale"], data["product"], data["max_documents"]
),
}
)
def _question_suggestions(searcher, text, locale, product, max_results):
if max_results <= 0:
return []
search_filter = es_utils.F(
model="questions_question",
question_is_archived=False,
question_is_locked=False,
question_is_solved=True,
)
if product:
search_filter &= es_utils.F(product=product)
if locale:
search_filter &= es_utils.F(question_locale=locale)
questions = []
searcher = _query(searcher, QuestionMappingType, search_filter, text, locale)
question_ids = [result["id"] for result in searcher[:max_results]]
questions = [
QuestionSerializer(instance=q).data for q in Question.objects.filter(id__in=question_ids)
]
return questions
def _document_suggestions(searcher, text, locale, product, max_results):
if max_results <= 0:
return []
search_filter = es_utils.F(
model="wiki_document",
document_category__in=settings.SEARCH_DEFAULT_CATEGORIES,
document_locale=locale,
document_is_archived=False,
)
if product:
search_filter &= es_utils.F(product=product)
documents = []
searcher = _query(searcher, DocumentMappingType, search_filter, text, locale)
doc_ids = [result["id"] for result in searcher[:max_results]]
documents = [
DocumentDetailSerializer(instance=doc).data
for doc in Document.objects.filter(id__in=doc_ids)
]
return documents
def _query(searcher, mapping_type, search_filter, query_text, locale):
query_fields = mapping_type.get_query_fields()
query = {}
for field in query_fields:
for query_type in ["match", "match_phrase"]:
key = "{0}__{1}".format(field, query_type)
query[key] = query_text
# Transform query to be locale aware.
query = es_utils.es_query_with_analyzer(query, locale)
return (
searcher.doctypes(mapping_type.get_mapping_type_name())
.filter(search_filter)
.query(should=True, **query)
)

Просмотреть файл

@ -372,6 +372,5 @@ ES_LOCALE_ANALYZERS = {
}
DEFAULT_ES7_CONNECTION = "es7_default"
# default refresh_interval for all indices
DEFAULT_ES7_REFRESH_INTERVAL = "60s"

Просмотреть файл

@ -1,891 +0,0 @@
import json
import logging
import pprint
import time
from functools import wraps
import requests
from django.conf import settings
from django.db import reset_queries
from django.http import HttpResponse
from django.shortcuts import render
from django.utils.translation import ugettext as _
from elasticutils import S as UntypedS
from elasticutils.contrib.django import ES_EXCEPTIONS, F, S, get_es # noqa
from kitsune.search import config
from kitsune.search.utils import chunked
# These used to be constants, but that was problematic. Things like
# tests want to be able to dynamically change settings at run time,
# which isn't possible if these are constants.
def read_index(group):
"""Gets the name of the read index for a group."""
return "%s_%s" % (settings.ES_INDEX_PREFIX, settings.ES_INDEXES[group])
def write_index(group):
"""Gets the name of the write index for a group."""
return "%s_%s" % (settings.ES_INDEX_PREFIX, settings.ES_WRITE_INDEXES[group])
def all_read_indexes():
return [read_index(group) for group in list(settings.ES_INDEXES.keys())]
def all_write_indexes():
return [write_index(group) for group in list(settings.ES_WRITE_INDEXES.keys())]
# The number of things in a chunk. This is for parallel indexing via
# the admin.
CHUNK_SIZE = 20000
log = logging.getLogger("k.search.es")
class MappingMergeError(Exception):
"""Represents a mapping merge error"""
pass
class UnindexMeBro(Exception):
"""Raise in extract_document when doc should be removed."""
pass
class AnalyzerMixin(object):
def _with_analyzer(self, key, val, action):
"""Do a normal kind of query, with a analyzer added.
:arg key: is the field being searched
:arg val: Is a two-tupe of the text to query for and the name of
the analyzer to use.
:arg action: is the type of query being performed, like match or
match_phrase
"""
query, analyzer = val
clause = {
action: {
key: {
"query": query,
"analyzer": analyzer,
}
}
}
boost = self.field_boosts.get(key)
if boost is not None:
clause[action][key]["boost"] = boost
return clause
def process_query_match_phrase_analyzer(self, key, val, action):
"""A match phrase query that includes an analyzer."""
return self._with_analyzer(key, val, "match_phrase")
def process_query_match_analyzer(self, key, val, action):
"""A match query that includes an analyzer."""
return self._with_analyzer(key, val, "match")
def process_query_sqs(self, key, val, action):
"""Implements simple_query_string query"""
return {
"simple_query_string": {
"fields": [key],
"query": val,
"default_operator": "or",
}
}
def process_query_sqs_analyzer(self, key, val, action):
"""Implements sqs query that includes an analyzer"""
query, analyzer = val
return {
"simple_query_string": {
"fields": [key],
"query": query,
"analyzer": analyzer,
"default_operator": "or",
}
}
def process_query_match_whitespace(self, key, val, action):
"""A match query that uses the whitespace analyzer."""
return {
"match": {
key: {
"query": val,
"analyzer": "whitespace",
}
}
}
class Sphilastic(S, AnalyzerMixin):
"""Shim around elasticutils.contrib.django.S.
Implements some Kitsune-specific behavior to make our lives
easier.
.. Note::
This looks at the read index. If you need to look at something
different, build your own S.
"""
def print_query(self):
pprint.pprint(self._build_query())
def get_indexes(self):
# SphilasticUnified is a searcher and so it's _always_ used in
# a read context. Therefore, we always return the read index.
return [read_index(self.type.get_index_group())]
def process_query_mlt(self, key, val, action):
"""Add support for a more like this query to our S.
val is expected to be a dict like:
{
'fields': ['field1', 'field2'],
'like_text': 'text like this one',
}
"""
return {
"more_like_this": val,
}
class AnalyzerS(UntypedS, AnalyzerMixin):
"""This is to give the search view support for setting the analyzer.
This differs from Sphilastic in that this is a plain ES S object,
not based on Django.
This just exists as a way to mix together UntypedS and AnalyzerMixin.
"""
pass
def get_mappings(index):
mappings = {}
from kitsune.search.models import get_mapping_types
for cls in get_mapping_types():
group = cls.get_index_group()
if index == write_index(group) or index == read_index(group):
mappings[cls.get_mapping_type_name()] = cls.get_mapping()
return mappings
def get_all_mappings():
mappings = {}
from kitsune.search.models import get_mapping_types
for cls in get_mapping_types():
mappings[cls.get_mapping_type_name()] = cls.get_mapping()
return mappings
def get_indexes(all_indexes=False):
"""Query ES to get a list of indexes that actually exist.
:returns: A dict like {index_name: document_count}.
"""
es = get_es()
status = es.indices.status()
indexes = status["indices"]
if not all_indexes:
indexes = dict(
(k, v) for k, v in list(indexes.items()) if k.startswith(settings.ES_INDEX_PREFIX)
)
return [(name, value["docs"]["num_docs"]) for name, value in list(indexes.items())]
def get_doctype_stats(index):
"""Returns a dict of name -> count for documents indexed.
For example:
>>> get_doctype_stats()
{'questions_question': 14216, 'forums_thread': 419, 'wiki_document': 759}
:throws elasticsearch.exceptions.ConnectionError: if there is a
connection error, including a timeout.
:throws elasticsearch.exceptions.NotFound: if the index doesn't exist
"""
stats = {}
from kitsune.search.models import get_mapping_types
for cls in get_mapping_types():
if cls.get_index() == index:
# Note: Can't use cls.search() here since that returns a
# Sphilastic which is hard-coded to look only at the
# read index..
s = S(cls).indexes(index)
stats[cls.get_mapping_type_name()] = s.count()
return stats
def delete_index(index):
get_es().indices.delete(index=index, ignore=[404])
def format_time(time_to_go):
"""Returns minutes and seconds string for given time in seconds"""
if time_to_go < 60:
return "%ds" % time_to_go
return "%dm %ds" % (time_to_go / 60, time_to_go % 60)
def get_documents(cls, ids):
"""Returns a list of ES documents with specified ids and doctype
:arg cls: the mapping type class with a ``.search()`` to use
:arg ids: the list of ids to retrieve documents for
:returns: list of documents as dicts
"""
# FIXME: We pull the field names from the mapping, but I'm not
# sure if this works in all cases or not and it's kind of hacky.
fields = list(cls.get_mapping()["properties"].keys())
ret = cls.search().filter(id__in=ids).values_dict(*fields)[: len(ids)]
return cls.reshape(ret)
def get_analysis():
"""Generate all our custom analyzers, tokenizers, and filters
These are variants of the Snowball analyzer for various languages,
but could also include custom analyzers if the need arises.
"""
analyzers = {}
filters = {}
# The keys are locales to look up to decide the analyzer's name.
# The values are the language name to set for Snowball.
snowball_langs = {
"eu": "Basque",
"ca": "Catalan",
"da": "Danish",
"nl": "Dutch",
"en-US": "English",
"fi": "Finnish",
"fr": "French",
"de": "German",
"hu": "Hungarian",
"it": "Italian",
"no": "Norwegian",
"pt-BR": "Portuguese",
"ro": "Romanian",
"ru": "Russian",
"es": "Spanish",
"sv": "Swedish",
"tr": "Turkish",
}
for locale, language in list(snowball_langs.items()):
analyzer_name = es_analyzer_for_locale(locale, synonyms=False)
analyzers[analyzer_name] = {
"type": "snowball",
"language": language,
}
# The snowball analyzer is actually just a shortcut that does
# a particular set of tokenizers and analyzers. According to
# the docs, the below is the same as that, plus synonym handling.
if locale in config.ES_SYNONYM_LOCALES:
analyzer_name = es_analyzer_for_locale(locale, synonyms=True)
analyzers[analyzer_name] = {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"synonyms-" + locale,
"stop",
"snowball-" + locale,
],
}
for locale in config.ES_SYNONYM_LOCALES:
filter_name, filter_body = es_get_synonym_filter(locale)
filters[filter_name] = filter_body
filters["snowball-" + locale] = {
"type": "snowball",
"language": snowball_langs[locale],
}
# Done!
return {
"analyzer": analyzers,
"filter": filters,
}
def es_get_synonym_filter(locale):
# Avoid circular import
from kitsune.search.models import Synonym
# The synonym filter doesn't like it if the synonyms list is empty.
# If there are no synyonms, just make a no-op filter by making a
# synonym from one word to itself.
# TODO: Someday this should be something like .filter(locale=locale)
synonyms = list(Synonym.objects.all()) or ["firefox => firefox"]
name = "synonyms-" + locale
body = {
"type": "synonym",
"synonyms": [str(s) for s in synonyms],
}
return name, body
def recreate_indexes(es=None, indexes=None):
"""Deletes indexes and recreates them.
:arg es: An ES object to use. Defaults to calling `get_es()`.
:arg indexes: A list of indexes to recreate. Defaults to all write
indexes.
"""
if es is None:
es = get_es()
if indexes is None:
indexes = all_write_indexes()
for index in indexes:
delete_index(index)
# There should be no mapping-conflict race here since the index doesn't
# exist. Live indexing should just fail.
# Simultaneously create the index, the mappings, the analyzers, and
# the tokenizers, so live indexing doesn't get a chance to index
# anything between and infer a bogus mapping (which ES then freaks
# out over when we try to lay in an incompatible explicit mapping).
es.indices.create(
index=index,
body={
"mappings": get_mappings(index),
"settings": {
"analysis": get_analysis(),
},
},
)
# Wait until the index is there.
es.cluster.health(wait_for_status="yellow")
def get_index_settings(index):
"""Returns ES settings for this index"""
return get_es().indices.get_settings(index=index).get(index, {}).get("settings", {})
def get_indexable(percent=100, seconds_ago=0, mapping_types=None):
"""Returns a list of (class, iterable) for all the things to index
:arg percent: Defaults to 100. Allows you to specify how much of
each doctype you want to index. This is useful for
development where doing a full reindex takes an hour.
:arg mapping_types: The list of mapping types to index.
"""
from kitsune.search.models import get_mapping_types
# Note: Passing in None will get all the mapping types
mapping_types = get_mapping_types(mapping_types)
to_index = []
percent = float(percent) / 100
for cls in mapping_types:
indexable = cls.get_indexable(seconds_ago=seconds_ago)
if percent < 1:
indexable = indexable[: int(indexable.count() * percent)]
to_index.append((cls, indexable))
return to_index
def index_chunk(cls, id_list, reraise=False):
"""Index a chunk of documents.
:arg cls: The MappingType class.
:arg id_list: Iterable of ids of that MappingType to index.
:arg reraise: False if you want errors to be swallowed and True
if you want errors to be thrown.
"""
# Note: This bulk indexes in batches of 80. I didn't arrive at
# this number through a proper scientific method. It's possible
# there's a better number. It takes a while to fiddle with,
# though. Probably best to expose the number as an environment
# variable, then run a script that takes timings for
# --criticalmass, runs overnight and returns a more "optimal"
# number.
for ids in chunked(id_list, 80):
documents = []
for id_ in ids:
try:
documents.append(cls.extract_document(id_))
except UnindexMeBro:
# extract_document throws this in cases where we need
# to remove the item from the index.
cls.unindex(id_)
except Exception:
log.exception("Unable to extract/index document (id: %d)", id_)
if reraise:
raise
if documents:
cls.bulk_index(documents, id_field="id")
if settings.DEBUG:
# Nix queries so that this doesn't become a complete
# memory hog and make Will's computer sad when DEBUG=True.
reset_queries()
def es_reindex_cmd(
percent=100, delete=False, mapping_types=None, criticalmass=False, seconds_ago=0, log=log
):
"""Rebuild ElasticSearch indexes
:arg percent: 1 to 100--the percentage of the db to index
:arg delete: whether or not to wipe the index before reindexing
:arg mapping_types: list of mapping types to index
:arg criticalmass: whether or not to index just a critical mass of
things
:arg seconds_ago: things updated less than this number of seconds
ago should be reindexed
:arg log: the logger to use
"""
es = get_es()
if mapping_types is None:
indexes = all_write_indexes()
else:
indexes = indexes_for_doctypes(mapping_types)
need_delete = False
for index in indexes:
try:
# This is used to see if the index exists.
get_doctype_stats(index)
except ES_EXCEPTIONS:
if not delete:
log.error('The index "%s" does not exist. ' "You must specify --delete." % index)
need_delete = True
if need_delete:
return
if delete:
log.info("wiping and recreating %s...", ", ".join(indexes))
recreate_indexes(es, indexes)
if criticalmass:
# The critical mass is defined as the entire KB plus the most
# recent 15k questions (which is about how many questions
# there were created in the last 180 days). We build that
# indexable here.
# Get only questions and wiki document stuff.
all_indexable = get_indexable(mapping_types=["questions_question", "wiki_document"])
# The first item is questions because we specified that
# order. Old questions don't show up in searches, so we nix
# them by reversing the list (ordered by id ascending) and
# slicing it.
all_indexable[0] = (all_indexable[0][0], list(reversed(all_indexable[0][1]))[:15000])
elif mapping_types:
all_indexable = get_indexable(percent, seconds_ago, mapping_types)
else:
all_indexable = get_indexable(percent, seconds_ago)
try:
old_refreshes = {}
# We're doing a lot of indexing, so we get the refresh_interval of
# the index currently, then nix refreshing. Later we'll restore it.
for index in indexes:
old_refreshes[index] = get_index_settings(index).get("index.refresh_interval", "1s")
# Disable automatic refreshing
es.indices.put_settings(index=index, body={"index": {"refresh_interval": "-1"}})
start_time = time.time()
for cls, indexable in all_indexable:
cls_start_time = time.time()
total = len(indexable)
if total == 0:
continue
chunk_start_time = time.time()
log.info("reindexing %s. %s to index....", cls.get_mapping_type_name(), total)
i = 0
for chunk in chunked(indexable, 1000):
chunk_start_time = time.time()
index_chunk(cls, chunk)
i += len(chunk)
time_to_go = (total - i) * ((time.time() - cls_start_time) / i)
per_1000 = (time.time() - cls_start_time) / (i / 1000.0)
this_1000 = time.time() - chunk_start_time
log.info(
" %s/%s %s... (%s/1000 avg, %s ETA)",
i,
total,
format_time(this_1000),
format_time(per_1000),
format_time(time_to_go),
)
delta_time = time.time() - cls_start_time
log.info(
" done! (%s total, %s/1000 avg)",
format_time(delta_time),
format_time(delta_time / (total / 1000.0)),
)
delta_time = time.time() - start_time
log.info("done! (%s total)", format_time(delta_time))
finally:
# Re-enable automatic refreshing
for index, old_refresh in list(old_refreshes.items()):
es.indices.put_settings(index=index, body={"index": {"refresh_interval": old_refresh}})
def es_delete_cmd(index, noinput=False, log=log):
"""Deletes an index"""
try:
indexes = [name for name, count in get_indexes()]
except ES_EXCEPTIONS:
log.error(
"Your elasticsearch process is not running or ES_URLS "
"is set wrong in your settings_local.py file."
)
return
if index not in indexes:
log.error('Index "%s" is not a valid index.', index)
return
if index in all_read_indexes() and not noinput:
ret = input('"%s" is a read index. Are you sure you want to delete it? (yes/no) ' % index)
if ret != "yes":
log.info("Not deleting the index.")
return
log.info('Deleting index "%s"...', index)
delete_index(index)
log.info("Done!")
def es_status_cmd(checkindex=False, log=log):
"""Shows elastic search index status"""
try:
# TODO: SUMO has a single ES_URL and that's the ZLB and does
# the balancing. If that ever changes and we have multiple
# ES_URLs, then this should get fixed.
es_deets = requests.get(settings.ES_URLS[0]).json()
except requests.exceptions.RequestException:
pass
read_doctype_stats = {}
for index in all_read_indexes():
try:
read_doctype_stats[index] = get_doctype_stats(index)
except ES_EXCEPTIONS:
read_doctype_stats[index] = None
if set(all_read_indexes()) == set(all_write_indexes()):
write_doctype_stats = read_doctype_stats
else:
write_doctype_stats = {}
for index in all_write_indexes():
try:
write_doctype_stats[index] = get_doctype_stats(index)
except ES_EXCEPTIONS:
write_doctype_stats[index] = None
try:
indexes = get_indexes(all_indexes=True)
except ES_EXCEPTIONS:
log.error(
"Your elasticsearch process is not running or ES_URLS "
"is set wrong in your settings_local.py file."
)
return
log.info("Elasticsearch:")
log.info(" Version : %s", es_deets["version"]["number"])
log.info("Settings:")
log.info(" ES_URLS : %s", settings.ES_URLS)
log.info(" ES_INDEX_PREFIX : %s", settings.ES_INDEX_PREFIX)
log.info(" ES_LIVE_INDEXING : %s", settings.ES_LIVE_INDEXING)
log.info(" ES_INDEXES : %s", settings.ES_INDEXES)
log.info(" ES_WRITE_INDEXES : %s", settings.ES_WRITE_INDEXES)
log.info("Index stats:")
if indexes:
log.info(" List of indexes:")
for name, count in sorted(indexes):
read_write = []
if name in all_read_indexes():
read_write.append("READ")
if name in all_write_indexes():
read_write.append("WRITE")
log.info(" %-22s: %s %s", name, count, "/".join(read_write))
else:
log.info(" There are no %s indexes.", settings.ES_INDEX_PREFIX)
if not read_doctype_stats:
read_index_names = ", ".join(all_read_indexes())
log.info(" No read indexes exist. (%s)", read_index_names)
else:
log.info(" Read indexes:")
for index, stats in list(read_doctype_stats.items()):
if stats is None:
log.info(" %s does not exist", index)
else:
log.info(" %s:", index)
for name, count in sorted(stats.items()):
log.info(" %-22s: %d", name, count)
if set(all_read_indexes()) == set(all_write_indexes()):
log.info(" Write indexes are the same as the read indexes.")
else:
if not write_doctype_stats:
write_index_names = ", ".join(all_write_indexes())
log.info(" No write indexes exist. (%s)", write_index_names)
else:
log.info(" Write indexes:")
for index, stats in list(write_doctype_stats.items()):
if stats is None:
log.info(" %s does not exist", index)
else:
log.info(" %s:", index)
for name, count in sorted(stats.items()):
log.info(" %-22s: %d", name, count)
if checkindex:
# Go through the index and verify everything
log.info("Checking index contents....")
missing_docs = 0
for cls, id_list in get_indexable():
for id_group in chunked(id_list, 100):
doc_list = get_documents(cls, id_group)
if len(id_group) != len(doc_list):
doc_list_ids = [doc["id"] for doc in doc_list]
for id_ in id_group:
if id_ not in doc_list_ids:
log.info(" Missing %s %s", cls.get_model_name(), id_)
missing_docs += 1
if missing_docs:
print("There were %d missing_docs" % missing_docs)
def es_search_cmd(query, pages=1, log=log):
"""Simulates a front page search"""
from kitsune.sumo.tests import LocalizingClient
from kitsune.sumo.urlresolvers import reverse
client = LocalizingClient()
output = []
output.append("Search for: %s" % query)
output.append("")
data = {"q": query, "format": "json"}
url = reverse("search")
# The search view shows 10 results at a time. So we hit it few
# times---once for each page.
for pageno in range(pages):
pageno = pageno + 1
data["page"] = pageno
resp = client.get(url, data)
if resp.status_code != 200:
output.append("ERROR: %s" % resp.content)
break
else:
content = json.loads(resp.content)
results = content["results"]
for mem in results:
output.append(
"%4d %5.2f %-10s %-20s"
% (mem["rank"], mem["score"], mem["type"], mem["title"])
)
output.append("")
for line in output:
log.info(line.encode("ascii", "ignore"))
def es_verify_cmd(log=log):
log.info("Behold! I am the magificent esverify command and I shall verify")
log.info("all things verifyable so that you can rest assured that your")
log.info("changes are bereft of the tawdry clutches of whimsy and")
log.info("misfortune.")
log.info("")
log.info("Verifying mappings do not conflict.")
# Verify mappings that share the same index don't conflict
for index in all_write_indexes():
merged_mapping = {}
log.info("Verifying mappings for index: {index}".format(index=index))
start_time = time.time()
for cls_name, mapping in list(get_mappings(index).items()):
mapping = mapping["properties"]
for key, val in list(mapping.items()):
if key not in merged_mapping:
merged_mapping[key] = (val, [cls_name])
continue
# FIXME - We're comparing two dicts here. This might not
# work for non-trivial dicts.
if merged_mapping[key][0] != val:
raise MappingMergeError(
"%s key different for %s and %s" % (key, cls_name, merged_mapping[key][1])
)
merged_mapping[key][1].append(cls_name)
log.info("Done! {0}".format(format_time(time.time() - start_time)))
log.info("")
def es_analyzer_for_locale(locale, synonyms=False, fallback="standard"):
"""Pick an appropriate analyzer for a given locale.
If no analyzer is defined for `locale`, return fallback instead,
which defaults to ES analyzer named "standard".
If `synonyms` is True, this will return a synonym-using analyzer,
if that makes sense. In particular, it doesn't make sense to use
synonyms with the fallback analyzer.
"""
if locale in settings.ES_LOCALE_ANALYZERS:
analyzer = settings.ES_LOCALE_ANALYZERS[locale]
if synonyms and locale in config.ES_SYNONYM_LOCALES:
analyzer += "-synonyms"
else:
analyzer = fallback
if not settings.ES_USE_PLUGINS and analyzer in settings.ES_PLUGIN_ANALYZERS:
analyzer = fallback
return analyzer
def es_query_with_analyzer(query, locale):
"""Transform a query dict to use _analyzer actions for the right fields."""
analyzer = es_analyzer_for_locale(locale, synonyms=True)
new_query = {}
# Import locally to avoid circular import
from kitsune.search.models import get_mapping_types
localized_fields = []
for mt in get_mapping_types():
localized_fields.extend(mt.get_localized_fields())
for k, v in list(query.items()):
field, action = k.split("__")
if field in localized_fields:
new_query[k + "_analyzer"] = (v, analyzer)
else:
new_query[k] = v
return new_query
def indexes_for_doctypes(doctype):
# Import locally to avoid circular import.
from kitsune.search.models import get_mapping_types
return set(d.get_index() for d in get_mapping_types(doctype))
def handle_es_errors(template, status_code=503):
"""Handles Elasticsearch exceptions for views
Wrap the entire view in this and don't worry about Elasticsearch exceptions
again!
:arg template: template path string or function to generate the template
path string for HTML requests
:arg status_code: status code to return
:returns: content-type-appropriate HttpResponse
"""
def handler(fun):
@wraps(fun)
def _handler(request, *args, **kwargs):
try:
return fun(request, *args, **kwargs)
except ES_EXCEPTIONS as exc:
is_json = request.GET.get("format") == "json"
callback = request.GET.get("callback", "").strip()
content_type = "application/x-javascript" if callback else "application/json"
if is_json:
return HttpResponse(
json.dumps({"error": _("Search Unavailable")}),
content_type=content_type,
status=status_code,
)
# If template is a function, call it with the request, args
# and kwargs to get the template.
if callable(template):
actual_template = template(request, *args, **kwargs)
else:
actual_template = template
# Log exceptions so this isn't failing silently
log.exception(exc)
return render(request, actual_template, status=503)
return _handler
return handler

Просмотреть файл

@ -13,7 +13,5 @@
<Url type="application/opensearchdescription+xml"
rel="self"
template="{{ host }}{{ url('search.plugin', locale=locale) }}"/>
<Url type="application/x-suggestions+json"
template="{{ host }}{{ url('search.suggestions', locale=locale) }}?q={searchTerms}"/>
<moz:SearchForm>{{ host }}{{ url('search', locale=locale) }}</moz:SearchForm>
</OpenSearchDescription>

Просмотреть файл

Просмотреть файл

Просмотреть файл

@ -1,21 +0,0 @@
from django.core.management.base import LabelCommand
from kitsune.search.es_utils import es_delete_cmd
from kitsune.search.utils import FakeLogger
class Command(LabelCommand):
label = "index"
help = "Delete an index from elastic search."
def add_arguments(self, parser):
super().add_arguments(parser)
parser.add_argument(
"--noinput",
action="store_true",
dest="noinput",
help="Do not ask for input--just do it",
)
def handle_label(self, label, **options):
es_delete_cmd(label, noinput=options["noinput"], log=FakeLogger(self.stdout))

Просмотреть файл

@ -1,87 +0,0 @@
from django.core.management.base import BaseCommand, CommandError
from django.test.utils import override_settings
from kitsune.search.es_utils import es_reindex_cmd
from kitsune.search.utils import FakeLogger
class Command(BaseCommand):
help = "Reindex the database for Elastic."
def add_arguments(self, parser):
parser.add_argument(
"--percent",
type=int,
dest="percent",
default=100,
help="Reindex a percentage of things",
)
parser.add_argument(
"--delete", action="store_true", dest="delete", help="Wipes index before reindexing"
)
parser.add_argument(
"--hours-ago",
type=int,
dest="hours_ago",
default=0,
help="Reindex things updated N hours ago",
)
parser.add_argument(
"--minutes-ago",
type=int,
dest="minutes_ago",
default=0,
help="Reindex things updated N minutes ago",
)
parser.add_argument(
"--seconds-ago",
type=int,
dest="seconds_ago",
default=0,
help="Reindex things updated N seconds ago",
)
parser.add_argument(
"--mapping_types",
dest="mapping_types",
default=None,
help="Comma-separated list of mapping types to index",
)
parser.add_argument(
"--criticalmass",
action="store_true",
dest="criticalmass",
help="Indexes a critical mass of things",
)
# We (ab)use override_settings to force ES_LIVE_INDEXING for the
# duration of this command so that it actually indexes stuff.
@override_settings(ES_LIVE_INDEXING=True)
def handle(self, *args, **options):
percent = options["percent"]
delete = options["delete"]
mapping_types = options["mapping_types"]
criticalmass = options["criticalmass"]
seconds_ago = options["seconds_ago"]
seconds_ago += options["minutes_ago"] * 60
seconds_ago += options["hours_ago"] * 3600
if mapping_types:
mapping_types = mapping_types.split(",")
if not 1 <= percent <= 100:
raise CommandError("percent should be between 1 and 100")
if percent < 100 and seconds_ago:
raise CommandError("you can't specify a time ago and percent")
if criticalmass and seconds_ago:
raise CommandError("you can't specify a time ago and criticalmass")
if percent < 100 and criticalmass:
raise CommandError("you can't specify criticalmass and percent")
if mapping_types and criticalmass:
raise CommandError("you can't specify criticalmass and mapping_types")
es_reindex_cmd(
percent=percent,
delete=delete,
mapping_types=mapping_types,
criticalmass=criticalmass,
seconds_ago=seconds_ago,
log=FakeLogger(self.stdout),
)

Просмотреть файл

@ -1,25 +0,0 @@
from django.core.management.base import BaseCommand
from kitsune.search.es_utils import es_search_cmd
from kitsune.search.utils import FakeLogger
class Command(BaseCommand):
help = "Does a front-page search for given query"
def add_arguments(self, parser):
super().add_arguments(parser)
parser.add_argument("args", metavar="search_term", nargs="+")
parser.add_argument(
"--pages",
type=int,
dest="pages",
default=1,
help="Number of pages of results you want to see",
)
def handle(self, *args, **options):
pages = options["pages"]
query = " ".join(args)
es_search_cmd(query, pages, FakeLogger(self.stdout))

Просмотреть файл

@ -1,19 +0,0 @@
from django.core.management.base import BaseCommand
from kitsune.search.es_utils import es_status_cmd
from kitsune.search.utils import FakeLogger
class Command(BaseCommand):
help = "Shows elastic search index status."
def add_arguments(self, parser):
parser.add_argument(
"--checkindex",
action="store_true",
dest="checkindex",
help="Checks the index contents",
)
def handle(self, *args, **options):
es_status_cmd(options["checkindex"], log=FakeLogger(self.stdout))

Просмотреть файл

@ -1,11 +0,0 @@
from django.core.management.base import BaseCommand
from kitsune.search.es_utils import es_verify_cmd
from kitsune.search.utils import FakeLogger
class Command(BaseCommand):
help = "Verifies correctness of all things verifyable."
def handle(self, *args, **options):
es_verify_cmd(FakeLogger(self.stdout))

Просмотреть файл

@ -1,50 +1,14 @@
import datetime
import logging
from threading import local
from django.conf import settings
from django.core import signals
from django.db import models
from django.db.models.signals import m2m_changed, post_save, pre_delete
from django.dispatch import receiver
from elasticsearch.exceptions import NotFoundError
from elasticutils.contrib.django import MLT, Indexable, MappingType
from elasticutils.contrib.django import Indexable, MappingType
from kitsune.search import es_utils
from kitsune.search.tasks import index_task, unindex_task
from kitsune.search.utils import to_class_path
from kitsune.sumo.models import ModelBase
log = logging.getLogger("k.search.es")
# db_table_name -> MappingType class
_search_mapping_types = {}
def get_mapping_types(mapping_types=None):
"""Returns a list of MappingTypes"""
if mapping_types is None:
values = list(_search_mapping_types.values())
else:
values = [_search_mapping_types[name] for name in mapping_types]
# Sort to stabilize
values.sort(key=lambda cls: cls.get_mapping_type_name())
return values
# Holds a threadlocal set of indexing tasks to be filed after the request.
_local = local()
def _local_tasks():
"""(Create and) return the threadlocal set of indexing tasks."""
if getattr(_local, "tasks", None) is None:
_local.tasks = set()
return _local.tasks
class SearchMixin(object):
"""A mixin which adds ES indexing support for the model
@ -64,19 +28,15 @@ class SearchMixin(object):
@classmethod
def get_mapping_type(cls):
"""Return the MappingType for this model"""
raise NotImplementedError
...
def index_later(self):
"""Register myself to be indexed at the end of the request."""
_local_tasks().add(
(index_task.delay, (to_class_path(self.get_mapping_type()), (self.pk,)))
)
return
def unindex_later(self):
"""Register myself to be unindexed at the end of the request."""
_local_tasks().add(
(unindex_task.delay, (to_class_path(self.get_mapping_type()), (self.pk,)))
)
return
class SearchMappingType(MappingType, Indexable):
@ -102,204 +62,45 @@ class SearchMappingType(MappingType, Indexable):
@classmethod
def search(cls):
return es_utils.Sphilastic(cls)
...
@classmethod
def get_index(cls):
return es_utils.write_index(cls.get_index_group())
...
@classmethod
def get_index_group(cls):
return "default"
...
@classmethod
def get_query_fields(cls):
"""Return the list of fields for query"""
raise NotImplementedError
...
@classmethod
def get_localized_fields(cls):
return []
...
@classmethod
def get_indexable(cls, seconds_ago=0):
# Some models have a gazillion instances. So we want to go
# through them one at a time in a way that doesn't pull all
# the data into memory all at once. So we iterate through ids
# and pull objects one at a time.
qs = cls.get_model().objects.order_by("pk").values_list("pk", flat=True)
if seconds_ago:
if cls.seconds_ago_filter:
dt = datetime.datetime.now() - datetime.timedelta(seconds=seconds_ago)
qs = qs.filter(**{cls.seconds_ago_filter: dt})
else:
# if seconds_ago is specified but seconds_ago_filter is falsy don't index anything
return qs.none()
return qs
...
@classmethod
def reshape(cls, results):
"""Reshapes the results so lists are lists and everything is not"""
# FIXME: This is dumb because we're changing the shape of the
# results multiple times in a hokey-pokey kind of way. We
# should fix this after SUMO is using Elasticsearch 1.x and it
# probably involves an ElasticUtils rewrite or whatever the
# next generation is.
list_keys = cls.list_keys
# FIXME: This builds a new dict from the old dict. Might be
# cheaper to do it in-place.
return [
dict((key, (val if key in list_keys else val[0])) for key, val in list(result.items()))
for result in results
]
...
@classmethod
def index(cls, *args, **kwargs):
if not settings.ES_LIVE_INDEXING:
return
super(SearchMappingType, cls).index(*args, **kwargs)
...
@classmethod
def unindex(cls, *args, **kwargs):
if not settings.ES_LIVE_INDEXING:
return
try:
super(SearchMappingType, cls).unindex(*args, **kwargs)
except NotFoundError:
# Ignore the case where we try to delete something that's
# not there.
pass
...
@classmethod
def morelikethis(cls, id_, s, fields):
"""MoreLikeThis API"""
return list(MLT(id_, s, fields, min_term_freq=1, min_doc_freq=1))
def _identity(s):
return s
def register_for_indexing(app, sender_class, instance_to_indexee=_identity, m2m=False):
"""Registers a model for signal-based live-indexing.
As data changes in the database, we need to update the relevant
documents in the index. This function registers Django model
classes with the appropriate signals and update/delete routines
such that our index stays up-to-date.
:arg app: A bit of UID we use to build the signal handlers'
dispatch_uids. This is prepended to the ``sender_class``
model name, "elastic", and the signal name, so it should
combine with those to make something unique. For this reason,
the app name is usually a good choice, yielding something like
"wiki.TaggedItem.elastic.post_save".
:arg sender_class: The class to listen for saves and deletes on.
:arg instance_to_indexee: A callable which takes the signalling
instance and returns the model instance to be indexed. The
returned instance should be a subclass of SearchMixin. If the
callable returns None, no indexing is performed.
Default: a callable which returns the sender itself.
:arg m2m: True if this is a m2m model and False otherwise.
Examples::
# Registers MyModel for indexing. post_save creates new
# documents in the index. pre_delete removes documents
# from the index.
register_for_indexing(MyModel, 'some_app')
# Registers RelatedModel for indexing. RelatedModel is related
# to some model in the sense that the document in the index is
# composed of data from some model and it's related
# RelatedModel instance. Because of that when we update
# RelatedModel instances, we need to update the associated
# document in the index for the related model.
#
# This registers the RelatedModel for indexing. post_save and
# pre_delete update the associated document in the index for
# the related model. The related model instance is determined
# by the instance_to_indexee function.
register_for_indexing(RelatedModel, 'some_app',
instance_to_indexee=lambda r: r.my_model)
"""
def maybe_call_method(instance, is_raw, method_name):
"""Call an (un-)indexing method on instance if appropriate."""
obj = instance_to_indexee(instance)
if obj is not None and not is_raw:
getattr(obj, method_name)()
def update(sender, instance, **kw):
"""File an add-to-index task for the indicated object."""
maybe_call_method(instance, kw.get("raw"), "index_later")
def delete(sender, instance, **kw):
"""File a remove-from-index task for the indicated object."""
maybe_call_method(instance, kw.get("raw"), "unindex_later")
def indexing_receiver(signal, signal_name):
"""Return a routine that registers signal handlers for indexers.
The returned registration routine uses strong refs, makes up a
dispatch_uid, and uses ``sender_class`` as the sender.
"""
return receiver(
signal,
sender=sender_class,
dispatch_uid="%s.%s.elastic.%s" % (app, sender_class.__name__, signal_name),
weak=False,
)
if m2m:
# This is an m2m model, so we regstier m2m_chaned and it
# updates the existing document in the index.
indexing_receiver(m2m_changed, "m2m_changed")(update)
else:
indexing_receiver(post_save, "post_save")(update)
indexing_receiver(pre_delete, "pre_delete")(
# If it's the indexed instance that's been deleted, go ahead
# and delete it from the index. Otherwise, we just want to
# update whatever model it's related to.
delete
if instance_to_indexee is _identity
else update
)
def register_mapping_type(cls):
"""Class decorator for registering MappingTypes for search"""
_search_mapping_types[cls.get_mapping_type_name()] = cls
return cls
def generate_tasks(**kwargs):
"""Goes through thread local index update tasks set and generates
celery tasks for all tasks in the set.
Because this works off of a set, it naturally de-dupes the tasks,
so if four tasks get tossed into the set that are identical, we
execute it only once.
"""
tasks = _local_tasks()
for fun, args in tasks:
fun(*args)
tasks.clear()
signals.request_finished.connect(generate_tasks)
...
class RecordManager(models.Manager):

Просмотреть файл

@ -1,124 +0,0 @@
from itertools import chain
from django.conf import settings
from elasticsearch import RequestsHttpConnection
from kitsune import search as constants
from kitsune.questions.models import QuestionMappingType
from kitsune.search import es_utils
from kitsune.wiki.models import DocumentMappingType
def apply_boosts(searcher):
"""Returns searcher with boosts applied"""
return searcher.boost(
question_title=4.0,
question_content=3.0,
question_answer_content=3.0,
post_title=2.0,
post_content=1.0,
document_title=6.0,
document_content=1.0,
document_keywords=8.0,
document_summary=2.0,
# Text phrases in document titles and content get an extra boost.
document_title__match_phrase=10.0,
document_content__match_phrase=8.0,
)
def generate_simple_search(search_form, language, with_highlights=False):
"""Generates an S given a form
:arg search_form: a validated SimpleSearch form
:arg language: the language code
:arg with_highlights: whether or not to ask for highlights
:returns: a fully formed S
"""
# We use a regular S here because we want to search across
# multiple doctypes.
searcher = (
es_utils.AnalyzerS()
.es(
urls=settings.ES_URLS,
timeout=settings.ES_TIMEOUT,
use_ssl=settings.ES_USE_SSL,
http_auth=settings.ES_HTTP_AUTH,
connection_class=RequestsHttpConnection,
)
.indexes(es_utils.read_index("default"))
)
cleaned = search_form.cleaned_data
doctypes = []
final_filter = es_utils.F()
cleaned_q = cleaned["q"]
products = cleaned["product"]
# Handle wiki filters
if cleaned["w"] & constants.WHERE_WIKI:
wiki_f = es_utils.F(
model="wiki_document",
document_category__in=settings.SEARCH_DEFAULT_CATEGORIES,
document_locale=language,
document_is_archived=False,
)
for p in products:
wiki_f &= es_utils.F(product=p)
doctypes.append(DocumentMappingType.get_mapping_type_name())
final_filter |= wiki_f
# Handle question filters
if cleaned["w"] & constants.WHERE_SUPPORT:
question_f = es_utils.F(
model="questions_question", question_is_archived=False, question_has_helpful=True
)
for p in products:
question_f &= es_utils.F(product=p)
doctypes.append(QuestionMappingType.get_mapping_type_name())
final_filter |= question_f
# Build a filter for those filters and add the other bits to
# finish the search
searcher = searcher.doctypes(*doctypes)
searcher = searcher.filter(final_filter)
if cleaned["explain"]:
searcher = searcher.explain()
if with_highlights:
# Set up the highlights. Show the entire field highlighted.
searcher = searcher.highlight(
"question_content", # support forum
"document_summary", # kb
pre_tags=["<b>"],
post_tags=["</b>"],
number_of_fragments=0,
)
searcher = apply_boosts(searcher)
# Build the query
query_fields = chain(
*[cls.get_query_fields() for cls in [DocumentMappingType, QuestionMappingType]]
)
query = {}
# Create match and match_phrase queries for every field
# we want to search.
for field in query_fields:
for query_type in ["match", "match_phrase"]:
query["%s__%s" % (field, query_type)] = cleaned_q
# Transform the query to use locale aware analyzers.
query = es_utils.es_query_with_analyzer(query, language)
searcher = searcher.query(should=True, **query)
return searcher

Просмотреть файл

@ -1,83 +0,0 @@
"""
Utitilities for working with synonyms, both in the database and in ES.
"""
import re
from kitsune.search import es_utils
from kitsune.search.models import Synonym
class SynonymParseError(Exception):
"""One or more parser errors were found. Has a list of errors found."""
def __init__(self, errors, *args, **kwargs):
super(SynonymParseError, self).__init__(*args, **kwargs)
self.errors = errors
def parse_synonyms(text):
"""
Parse synonyms from user entered text.
The input should look something like
foo => bar
baz, qux => flob, glork
:returns: A set of 2-tuples, ``(from_words, to_words)``. ``from_words``
and ``to_words`` will be strings.
:throws: A SynonymParseError, if any errors are found.
"""
errors = []
synonyms = set()
for i, line in enumerate(text.split("\n"), 1):
line = line.strip()
if not line:
continue
count = line.count("=>")
if count < 1:
errors.append("Syntax error on line %d: No => found." % i)
elif count > 1:
errors.append("Syntax error on line %d: Too many => found." % i)
else:
from_words, to_words = [s.strip() for s in line.split("=>")]
synonyms.add((from_words, to_words))
if errors:
raise SynonymParseError(errors)
else:
return synonyms
def count_out_of_date():
"""
Count number of synonyms that differ between the database and ES.
:returns: A 2-tuple where the first element is the number of synonyms
that are in the DB but not in ES, and the second element is the
number of synonyms in ES that are not in the DB.
"""
es = es_utils.get_es()
index_name = es_utils.write_index("default")
settings = es.indices.get_settings(index_name).get(index_name, {}).get("settings", {})
synonym_key_re = re.compile(r"index\.analysis\.filter\.synonyms-.*\.synonyms\.\d+")
synonyms_in_es = set()
for key, val in list(settings.items()):
if synonym_key_re.match(key):
synonyms_in_es.add(val)
synonyms_in_db = set(str(s) for s in Synonym.objects.all())
synonyms_to_add = synonyms_in_db - synonyms_in_es
synonyms_to_remove = synonyms_in_es - synonyms_in_db
if synonyms_to_remove == {"firefox => firefox"}:
synonyms_to_remove = set()
return (len(synonyms_to_add), len(synonyms_to_remove))

Просмотреть файл

@ -1,177 +0,0 @@
import datetime
import logging
import sys
import traceback
from celery import task
from elasticutils.contrib.django import get_es
from multidb.pinning import pin_this_thread, unpin_this_thread
from kitsune.search.es_utils import UnindexMeBro, get_analysis, index_chunk, write_index
from kitsune.search.utils import from_class_path
# This is present in memcached when reindexing is in progress and
# holds the number of outstanding index chunks. Once it hits 0,
# indexing is done.
OUTSTANDING_INDEX_CHUNKS = "search:outstanding_index_chunks"
CHUNK_SIZE = 50000
log = logging.getLogger("k.task")
class IndexingTaskError(Exception):
"""Exception that captures current exception information
Some exceptions aren't pickleable. This uses traceback module to
format the exception that's currently being thrown and tosses it
in the message of IndexingTaskError at the time the
IndexingTaskError is created.
So you can do this::
try:
# some code that throws an error
except Exception as exc:
raise IndexingTaskError()
The message will have the message and traceback from the original
exception thrown.
Yes, this is goofy.
"""
def __init__(self):
super(IndexingTaskError, self).__init__(traceback.format_exc())
@task()
def index_chunk_task(write_index, batch_id, rec_id, chunk):
"""Index a chunk of things.
:arg write_index: the name of the index to index to
:arg batch_id: the name for the batch this chunk belongs to
:arg rec_id: the id for the record for this task
:arg chunk: a (class, id_list) of things to index
"""
cls_path, id_list = chunk
cls = from_class_path(cls_path)
rec = None
# Need to import Record here to prevent circular import
from kitsune.search.models import Record
try:
# Pin to master db to avoid replication lag issues and stale data.
pin_this_thread()
# Update record data.
rec = Record.objects.get(pk=rec_id)
rec.start_time = datetime.datetime.now()
rec.message = "Reindexing into %s" % write_index
rec.status = Record.STATUS_IN_PROGRESS
rec.save()
index_chunk(cls, id_list, reraise=True)
rec.mark_success()
except Exception:
if rec is not None:
rec.mark_fail("Errored out %s %s" % (sys.exc_info()[0], sys.exc_info()[1]))
log.exception("Error while indexing a chunk")
# Some exceptions aren't pickleable and we need this to throw
# things that are pickleable.
raise IndexingTaskError()
finally:
unpin_this_thread()
# Note: If you reduce the length of RETRY_TIMES, it affects all tasks
# currently in the celery queue---they'll throw an IndexError.
RETRY_TIMES = (
60, # 1 minute
5 * 60, # 5 minutes
10 * 60, # 10 minutes
30 * 60, # 30 minutes
60 * 60, # 60 minutes
)
MAX_RETRIES = len(RETRY_TIMES)
@task()
def index_task(cls_path, id_list, **kw):
"""Index documents specified by cls and ids"""
cls = from_class_path(cls_path)
try:
# Pin to master db to avoid replication lag issues and stale
# data.
pin_this_thread()
qs = cls.get_model().objects.filter(pk__in=id_list).values_list("pk", flat=True)
for id_ in qs:
try:
cls.index(cls.extract_document(id_), id_=id_)
except UnindexMeBro:
# If extract_document throws this, then we need to
# remove this item from the index.
cls.unindex(id_)
except Exception as exc:
retries = index_task.request.retries
if retries >= MAX_RETRIES:
# Some exceptions aren't pickleable and we need this to
# throw things that are pickleable.
raise IndexingTaskError()
index_task.retry(exc=exc, max_retries=MAX_RETRIES, countdown=RETRY_TIMES[retries])
finally:
unpin_this_thread()
@task()
def unindex_task(cls_path, id_list, **kw):
"""Unindex documents specified by cls and ids"""
cls = from_class_path(cls_path)
try:
# Pin to master db to avoid replication lag issues and stale
# data.
pin_this_thread()
for id_ in id_list:
cls.unindex(id_)
except Exception as exc:
retries = unindex_task.request.retries
if retries >= MAX_RETRIES:
# Some exceptions aren't pickleable and we need this to
# throw things that are pickleable.
raise IndexingTaskError()
unindex_task.retry(exc=exc, max_retries=MAX_RETRIES, countdown=RETRY_TIMES[retries])
finally:
unpin_this_thread()
@task()
def update_synonyms_task():
es = get_es()
# Close the index, update the settings, then re-open it.
# This will cause search to be unavailable for a few seconds.
# This updates all of the analyzer settings, which is kind of overkill,
# but will make sure everything stays consistent.
index = write_index("default")
analysis = get_analysis()
# if anything goes wrong, it is very important to re-open the index.
try:
es.indices.close(index)
es.indices.put_settings(
index=index,
body={
"analysis": analysis,
},
)
finally:
es.indices.open(index)

Просмотреть файл

@ -1,72 +0,0 @@
{% extends "kadmin/base.html" %}
{% block content_title %}
<h1>Elastic Search - Index Browser of Doom</h1>
{% endblock %}
{% block content %}
<section>
<h1>Find a specific item</h1>
<form method="GET">
<label for="bucket-field">Model:</label>
<select name="bucket" id="bucket-field">
{% for bucket in buckets %}
<option value="{{ bucket }}"{% if requested_bucket == bucket %} selected{% endif %}>{{ bucket }}</option>
{% endfor %}
</select>
<label for="id-field">ID:</label>
<input type="text" name="id" id="id-field" value="{{ requested_id }}">
<input type="Submit">
</form>
</section>
{% if requested_data %}
<section>
<h1>Item {{ requested_data.id }} from {{ requested_bucket }}</h1>
<table>
<thead>
<tr>
<th>key</th>
<th>value</th>
</tr>
</thead>
<tbody>
{% for key, val in requested_data.items %}
<tr>
<th>{{ key }}</th>
<td>{{ val }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</section>
{% else %}
<section>
<h1>Most recently indexed items per bucket</h1>
{% for cls_name, items in last_20_by_bucket %}
<h2>{{ cls_name }}</h2>
<table>
<thead>
<tr>
<th>id</th>
<th>title</th>
<th>indexed on ({{ settings.TIME_ZONE }})</th>
</tr>
</thead>
<tbody>
{% for item in items %}
<tr>
<td><a href="?bucket={{ cls_name }}&id={{ item.id }}">{{ item.id }}</a></td>
{# cheating here because only one of these is filled, but #}
{# in django templates, you get an empty string if it's #}
{# not there. #}
<td>{{ item.question_title }}{{ item.post_title }}{{ item.document_title }}</td>
<td>{{ item.indexed_on }}</td>
</tr>
{% endfor %}
</tbody>
</table>
{% endfor %}
</section>
{% endif %}
{% endblock %}

Просмотреть файл

@ -1,382 +0,0 @@
{% extends "kadmin/base.html" %}
{% block content_title %}
<h1>Elastic Search</h1>
{% endblock %}
{% block extrastyle %}
{{ block.super }}
<style type="text/css">
div#content div {
margin-bottom: .5em;
}
.disabled {
color: #ccc;
}
progress {
width: 400px;
}
dd {
margin-left: 1em;
}
input[type="submit"].DANGER {
border: 3px red solid;
font: bold 12px/14px serif;
}
.errorspan {
background: #ffc;
border: 1px solid red;
padding: 1.5px;
}
.errorspan img {
transform: translate(0,-1px);
}
table.reindextable td.explanation {
width: 40%;
}
</style>
{% endblock %}
{% block content %}
<section>
<p>
Page last rendered: {{ now }} {{ settings.TIME_ZONE }}
</p>
</section>
{% if error_messages %}
<section>
<h1>Errors</h1>
{% for msg in error_messages %}
<p>{{ msg }}</p>
{% endfor %}
</section>
{% endif %}
{% if outstanding_records.count > 0 %}
<section>
<p>
Auto-refreshing every 30 seconds :: <a href="{{ request.path }}">Refresh page</a>
</p>
<script>setTimeout("window.location.reload(true);", 30000);</script>
<h2>{{ outstanding_records.count }} outstanding records</h2>
<table>
<thead>
<tr>
<th>batch</th>
<th>name</th>
<th>created</th>
<th>start</th>
<th>end</th>
<th>message</th>
<th>delta</th>
</tr>
</thead>
{% for record in outstanding_records %}
<tr>
<td>{{ record.batch_id }}</td>
<td>{{ record.name }}</td>
<td>{{ record.creation_time }}</td>
<td>{{ record.start_time }}</td>
<td>{{ record.end_time }}</td>
<td>{{ record.message }}</td>
<td>{{ record.delta }}</td>
</tr>
{% endfor %}
</table>
</section>
{% endif %}
{% if outstanding_chunks %}
<section>
<h1>Indexing in progress! Outstanding tasks: {{ outstanding_chunks }}</h1>
<p>
</p>
<table>
<thead>
<tr>
<th>message</th>
<th>start time</th>
</tr>
</thead>
<tbody>
{% for record in outstanding_records %}
<tr>
<td>{{ record.text }}</td>
<td>{{ record.starttime }}</td>
</tr>
{% endfor %}
</tbody>
</table>
<p>
Note: The number of records may not line up with the number of
outstanding indexing tasks because records are created when
the task starts.
</p>
</section>
{% endif %}
<section>
<h1>Settings and Elasticsearch details</h1>
<p>
Settings at the time this page was loaded:
</p>
<table>
<tr><th>ES_LIVE_INDEXING</th><td>{{ settings.ES_LIVE_INDEXING }}</td></tr>
<tr><th>ES_INDEX_PREFIX</th><td>{{ settings.ES_INDEX_PREFIX }}</td></tr>
<tr><th>ES_INDEXES</th><td>{{ settings.ES_INDEXES }}</td></tr>
<tr><th>ES_WRITE_INDEXES</th><td>{{ settings.ES_WRITE_INDEXES }}</td></tr>
<tr><th>Elasticsearch version</th><td>{{ es_deets.version.number }}</td></tr>
</table>
</section>
<section>
<h1>Index Status</h1>
<p>
All available indexes:
</p>
<table>
<thead>
<th>Index Name</th>
<th>Documents</th>
<th>Type</th>
<th>Delete</th>
</thead>
<tbody>
{% for index_name, index_count in indexes %}
<tr>
<td>{{ index_name }}</td>
<td>{{ index_count }}</td>
<td>
{% if index_name in read_indexes and index_name in write_indexes %}
READ/WRITE
{% else %}
{% if index_name in read_indexes %}
READ
{% else %}
{% if index_name in write_indexes %}
WRITE
{% endif %}
{% endif %}
{% endif %}
</td>
{% if index_name not in read_indexes %}
<td>
<form method="POST">
{% csrf_token %}
<input type="hidden" name="delete_index" value="{{ index_name }}">
<input type="submit" value="Delete">
</form>
</td>
{% else %}
<td>Disabled</td>
{% endif %}
</tr>
{% endfor %}
</tbody>
</table>
<h2>Read indexes</h2>
{% for index, stats in doctype_stats.items %}
<h3>{{ index }}</h3>
{% if stats == None %}
<p>Index does not exist.</p>
{% else %}
<table>
<thead>
<tr><th>doctype</th><th>count</th></tr>
</thead>
<tbody>
{% for doctype, count in stats.items %}
<tr><td>{{ doctype }}</td><td>{{ count }}</td></tr>
{% endfor %}
</tbody>
</table>
{% endif %}
{% endfor %}
<h2>Write indexes</h2>
{% if read_indexes == write_indexes %}
<p>
Write indexes are the same as the read indexes.
</p>
{% else %}
{% for index, stats in doctype_stats.items %}
<h3>{{ index }}</h3>
{% if stats == None %}
<p>Index does not exist.</p>
{% else %}
<table>
<thead>
<tr><th>doctype</th><th>count</th></tr>
</thead>
<tbody>
{% for doctype, count in stats.items %}
<tr><td>{{ doctype }}</td><td>{{ count }}</td></tr>
{% endfor %}
</tbody>
</table>
{% endif %}
{% endfor %}
{% endif %}
</section>
<section>
<h1>Actions</h1>
<table class="reindextable">
<tr>
<td colspan="2">
<h2>REINDEX into existing index</h2>
</td>
</tr>
<tr>
<td class="explanation">
<p>
Reindex into the existing WRITE index. Don't do this if you've
made mapping changes since this does not recreate the index with
the new mappings.
</p>
{% if outstanding_chunks %}
<p class="errornote">
WARNING! There are outstanding index tasks! Don't launch another
indexing pass unless you really know you want to.
</p>
{% endif %}
{% if not settings.ES_LIVE_INDEXING %}
<p class="errornote">
WARNING! <tt>ES_LIVE_INDEXING</tt> is False so you can't
reindex via the admin. Either enable <tt>ES_LIVE_INDEXING</tt>
or use the command line <tt>./manage.py esreindex</tt>.
</p>
{% endif %}
</td>
<td>
{% if doctype_write_stats != None %}
<form method="POST">
{% csrf_token %}
{% for index, stats in doctype_write_stats.items %}
<h3>{{ index }}</h3>
{% for doctype, count in stats.items %}
<input id="check_{{ doctype }}" type="checkbox" name="check_{{ doctype }}" value="yes" checked>
<label for="check_{{ doctype }}">{{ doctype }}</label><br>
{% endfor %}
{% endfor %}
<input type="submit" name="reindex" value="Reindex into write indexes"
{% if not settings.ES_LIVE_INDEXING or outstanding_chunks %}disabled{% endif %}>
</form>
{% endif %}
</td>
</tr>
<tr>
<td colspan="2">
<h2>DELETE existing index group's write index, recreate it and reindex</h2>
</td>
<tr>
<td class="explanation">
<p>
This <strong>DELETES</strong> the existing WRITE index for a
group, recreates it with the mappings, and indexes into the new
index. You should have to do this only when the search mapping
changes or when setting up the site for the first time.
</p>
{% if read_indexes == write_indexes %}
<p class="errornote">
WARNING! All read and write indexes are the same! Deleting and
rebuilding the index would be really bad!
</p>
{% endif %}
</td>
<td>
<form method="POST">
<table>
<tr>
<th></th>
<th>Group</th>
<th>Read Index</th>
<th>Write Index</th>
</tr>
{% for group, group_read_index, group_write_index in index_group_data %}
<tr>
<td>
<input id="check_{{ group }}" type="checkbox" name="check_{{ group }}" value="yes"
{% if group_read_index != group_write_index %}checked{% endif %}>
</td>
<td><label for="check_{{ group }}">{{ group }}</label></td>
<td>{{ group_read_index }}</td>
<td>{{ group_write_index }}</td>
<td>
{% if group_read_index == group_write_index %}
<span class="errorspan">
<img src="{{ STATIC_URL }}admin/img/icon_error.gif" />
This group's write index is a read index!
</span>
{% endif %}
</td>
</tr>
{% endfor %}
</table>
{% csrf_token %}
<input class="DANGER" type="submit" name="recreate_index" value="DELETE selected indexes and reindex"
{% if not settings.ES_LIVE_INDEXING or outstanding_chunks %}disabled{% endif %}>
</form>
</td>
</tr>
<tr>
<td colspan="2">
<h2>RESET records and mark as failed</h2>
</td>
</tr>
<tr>
<td class="explanation">
<p>
This marks outstanding records as fail. This allows you to run a
new reindexing pass.
</p>
</td>
<td>
<form method="POST">
{% csrf_token %}
<input type="hidden" name="reset" value="1">
<input type="submit" name="reset" value="Mark records as failed">
</form>
</td>
</tr>
</table>
</section>
<section>
<h1>Reindexing history</h1>
<table>
<thead>
<tr>
<th>batch</th>
<th>name</th>
<th>created</th>
<th>start</th>
<th>end</th>
<th>status</th>
<th>message</th>
<th>delta</th>
</tr>
</thead>
<tbody>
{% for record in recent_records %}
<tr>
<td>{{ record.batch_id }}</td>
<td>{{ record.name }}</td>
<td>{{ record.creation_time }}</td>
<td>{{ record.start_time }}</td>
<td>{{ record.end_time }}</td>
<td>{{ record.status }}</td>
<td>{{ record.message }}</td>
<td>{{ record.delta }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</section>
{% endblock %}

Просмотреть файл

@ -1,14 +0,0 @@
{% extends "kadmin/base.html" %}
{% block content_title %}
<h1>Elastic Search - Mapping Browser</h1>
{% endblock %}
{% block content %}
<section>
<h1>Merged Mapping</h1>
<pre>
{{ mapping }}
</pre>
</section>
{% endblock %}

Просмотреть файл

@ -1,150 +0,0 @@
{% extends "kadmin/base.html" %}
{% block content_title %}
<h1>Elastic Search - Synonym Editor</h1>
{% endblock %}
{% block content %}
<style>
.errornote,
.notice,
p,
textarea {
box-sizing: border-box;
max-width: 600px;
}
.notice {
font-weight: bold;
}
textarea {
height: 400px;
width: 600px;
border-left: 0;
margin-left: 0;
padding-left: 7px;
line-height: 14px;
}
.line-numbers {
box-sizing: border-box;
display: inline-block;
font-size: 9px;
line-height: 14px;
margin: 2px 0;
height: 400px;
overflow: hidden;
padding: 4px 7px 2px 5px;
border: 1px solid #ccc;
border-right: 1px dotted #ddd;
text-align: right;
opacity: 0.8;
}
</style>
<section>
<p>
There are currently {{ synonym_add_count }}
synonym{{ synonym_add_count|pluralize }} that have not been synced to ES,
and {{ synonym_remove_count }} synonym{{ synonym_add_count|pluralize }}
that need to be removed from ES.
</p>
<form method="POST">
{% csrf_token %}
<input type="hidden" name="sync_synonyms" value="1">
<input type="submit" value="Sync synonyms to ES">
</form>
<p>
Press this button to update the synonym list in Elasticsearch to match
what is in the database.
</p>
<p>
Keep in mind that changing synonyms will cause a small down time
to the search system, during which time users will receive a friendly
error message, and some parts of the site will be slower. This downtime
should only last a few seconds. Consider doing this during off-peak
hours, like after 00:00 UTC.
</p>
</section>
<section>
<p class="notice">
This is an advanced way to edit synonyms with no training wheels.
It is intended for bulk insertion and mass editing by expert users.
If that doesn't sound like you, you can use the
<a href="{% url 'admin:search_synonym_changelist' %}">simpler interface</a>
instead. Remember to come back here and click that sync button above
after editing the synonyms.
</p>
<p>
This is <strong>all</strong> the synonyms for the site. If you add a line,
it will create a new synonym set. If you delete a line, that synonym will
be deleted. Be careful!
</p>
<p>
The format here is one synonym set per line. A synonym set is a set of
words on the left that will be transformed into the set of words on the right, with a
"fat arrow" (<code>=></code>) in between. Each of the words on the left
set will be converted to all of the words on the right. For example a line
like <code>social => facebook, twitter</code> would make a search for
"social integration" match all of the documents "facebook integration"
and "twitter integration". It's fine to have multi-word phrases like
<code>address bar, location bar, awesome bar => address bar, location bar, awesome bar</code>,
which would make those three phrases completely interchangeable.
</p>
<p>
Note that the original word is lost during the conversion. If you want to
keep the original words(s) in the search, include those words on the right,
For example <code>social => facebook, twitter, social</code>. Also,
synonyms are one way only. If you want two way synonyms, you need two
lines, or to put all words on both sides.
</p>
{% for error in errors %}
<span class="errornote">
{{ error }}
</span>
{% endfor %}
<form method="POST">
{% csrf_token %}
<input type="submit" value="Save">
<br>
<!-- No space between these elements. -->
<div class="line-numbers"></div><textarea name="synonyms_text">{{ synonyms_text }}</textarea>
</form>
<p>Note, those are line numbers, not ID numbers.</p>
</section>
<script type="text/javascript">
// jquery isn't loaded yet. lame.
var textbox = document.querySelector('[name=synonyms_text]');
var lineNums = document.querySelector('.line-numbers');
function makeLineNumbers() {
var numLines = textbox.value.match(/\n/g).length + 1;
var linesHtml = '';
for (var i = 1; i <= numLines; i++) {
linesHtml += i + '<br>';
}
lineNums.innerHTML = linesHtml;
lineNums.scrollTop = textbox.scrollTop;
}
textbox.addEventListener('change', makeLineNumbers);
textbox.addEventListener('keyup', makeLineNumbers);
textbox.addEventListener('scroll', function() {
lineNums.scrollTop = textbox.scrollTop;
});
makeLineNumbers();
</script>
{% endblock %}

Просмотреть файл

@ -1,26 +0,0 @@
from django.test.client import RequestFactory
from django.test.utils import override_settings
import factory
from kitsune.search.models import Synonym
from kitsune.sumo.tests import TestCase
# Dummy request for passing to question_searcher() and brethren.
dummy_request = RequestFactory().get("/")
@override_settings(ES_LIVE_INDEXING=True)
class ElasticTestCase(TestCase):
"""Base class for Elastic Search tests, providing some conveniences"""
search_tests = True
class SynonymFactory(factory.DjangoModelFactory):
class Meta:
model = Synonym
from_words = "foo, bar"
to_words = "baz"

Просмотреть файл

@ -1,276 +0,0 @@
import json
import time
from nose.tools import eq_
from rest_framework.test import APIClient
from django.conf import settings
from kitsune.search.tests.test_es import ElasticTestCase
from kitsune.sumo.urlresolvers import reverse
from kitsune.questions.tests import QuestionFactory, AnswerFactory
from kitsune.products.tests import ProductFactory
from kitsune.wiki.tests import DocumentFactory, RevisionFactory
class SuggestViewTests(ElasticTestCase):
client_class = APIClient
# TODO: This should probably be a subclass of QuestionFactory
def _make_question(self, solved=True, **kwargs):
defaults = {
"title": "Login to website comments disabled " + str(time.time()),
"content": """
readersupportednews.org, sends me emails with a list of
articles to read.
The links to the articles work as normal, except that I
cannot login from the linked article - as required - to
send my comments.
I see a javascript activity statement at the bottom left
corner of my screen while the left button is depressed
on the Login button. it is gone when I release the left
button, but no results.
I have the latest (7) version of java enabled, on an XP
box.
Why this inability to login to this website commentary?
""",
}
defaults.update(kwargs)
q = QuestionFactory(**defaults)
if solved:
a = AnswerFactory(question=q)
q.solution = a
# Trigger a reindex for the question.
q.save()
return q
# TODO: This should probably be a subclass of DocumentFactory
def _make_document(self, **kwargs):
defaults = {
"title": "How to make a pie from scratch with email " + str(time.time()),
"category": 10,
}
defaults.update(kwargs)
d = DocumentFactory(**defaults)
RevisionFactory(document=d, is_approved=True)
d.save()
return d
def test_invalid_product(self):
res = self.client.get(reverse("search.suggest"), {"product": "nonexistant", "q": "search"})
eq_(res.status_code, 400)
eq_(res.data, {"product": ['Could not find product with slug "nonexistant".']})
def test_invalid_locale(self):
res = self.client.get(reverse("search.suggest"), {"locale": "bad-medicine", "q": "search"})
eq_(res.status_code, 400)
eq_(res.data, {"locale": ['Could not find locale "bad-medicine".']})
def test_invalid_fallback_locale_none_case(self):
# Test the locale -> locale case.
non_none_locale_fallback_pairs = [
(key, val)
for key, val in sorted(settings.NON_SUPPORTED_LOCALES.items())
if val is not None
]
locale, fallback = non_none_locale_fallback_pairs[0]
res = self.client.get(reverse("search.suggest"), {"locale": locale, "q": "search"})
eq_(res.status_code, 400)
error_message = '"{0}" is not supported, but has fallback locale "{1}".'.format(
locale, fallback
)
eq_(res.data, {"locale": [error_message]})
def test_invalid_fallback_locale_non_none_case(self):
# Test the locale -> None case which falls back to WIKI_DEFAULT_LANGUAGE.
has_none_locale_fallback_pairs = [
(key, val)
for key, val in sorted(settings.NON_SUPPORTED_LOCALES.items())
if val is None
]
locale, fallback = has_none_locale_fallback_pairs[0]
res = self.client.get(reverse("search.suggest"), {"locale": locale, "q": "search"})
eq_(res.status_code, 400)
error_message = '"{0}" is not supported, but has fallback locale "{1}".'.format(
locale, settings.WIKI_DEFAULT_LANGUAGE
)
eq_(res.data, {"locale": [error_message]})
def test_invalid_numbers(self):
res = self.client.get(
reverse("search.suggest"),
{
"max_questions": "a",
"max_documents": "b",
"q": "search",
},
)
eq_(res.status_code, 400)
eq_(
res.data,
{
"max_questions": ["A valid integer is required."],
"max_documents": ["A valid integer is required."],
},
)
def test_q_required(self):
res = self.client.get(reverse("search.suggest"))
eq_(res.status_code, 400)
eq_(res.data, {"q": ["This field is required."]})
def test_it_works(self):
q1 = self._make_question()
d1 = self._make_document()
self.refresh()
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
eq_([q["id"] for q in req.data["questions"]], [q1.id])
eq_([d["title"] for d in req.data["documents"]], [d1.title])
def test_filters_in_postdata(self):
q1 = self._make_question()
d1 = self._make_document()
self.refresh()
data = json.dumps({"q": "emails"})
# Note: Have to use .generic() because .get() will convert the
# data into querystring params and then it's clownshoes all
# the way down.
req = self.client.generic(
"GET", reverse("search.suggest"), data=data, content_type="application/json"
)
eq_(req.status_code, 200)
eq_([q["id"] for q in req.data["questions"]], [q1.id])
eq_([d["title"] for d in req.data["documents"]], [d1.title])
def test_both_querystring_and_body_raises_error(self):
self._make_question()
self._make_document()
self.refresh()
data = json.dumps({"q": "emails"})
# Note: Have to use .generic() because .get() will convert the
# data into querystring params and then it's clownshoes all
# the way down.
req = self.client.generic(
"GET",
reverse("search.suggest") + "?max_documents=3",
data=data,
content_type="application/json",
)
eq_(req.status_code, 400)
eq_(
req.data,
{"detail": "Put all parameters either in the querystring or the HTTP request body."},
)
def test_questions_max_results_0(self):
self._make_question()
self.refresh()
# Make sure something matches the query first.
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
eq_(len(req.data["questions"]), 1)
# If we specify "don't give me any" make sure we don't get any.
req = self.client.get(reverse("search.suggest"), {"q": "emails", "max_questions": "0"})
eq_(len(req.data["questions"]), 0)
def test_questions_max_results_non_0(self):
self._make_question()
self._make_question()
self._make_question()
self._make_question()
self._make_question()
self.refresh()
# Make sure something matches the query first.
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
eq_(len(req.data["questions"]), 5)
# Make sure we get only 3.
req = self.client.get(reverse("search.suggest"), {"q": "emails", "max_questions": "3"})
eq_(len(req.data["questions"]), 3)
def test_documents_max_results_0(self):
self._make_document()
self.refresh()
# Make sure something matches the query first.
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
eq_(len(req.data["documents"]), 1)
# If we specify "don't give me any" make sure we don't get any.
req = self.client.get(reverse("search.suggest"), {"q": "emails", "max_documents": "0"})
eq_(len(req.data["documents"]), 0)
def test_documents_max_results_non_0(self):
self._make_document()
self._make_document()
self._make_document()
self._make_document()
self._make_document()
self.refresh()
# Make sure something matches the query first.
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
eq_(len(req.data["documents"]), 5)
# Make sure we get only 3.
req = self.client.get(reverse("search.suggest"), {"q": "emails", "max_documents": "3"})
eq_(len(req.data["documents"]), 3)
def test_product_filter_works(self):
p1 = ProductFactory()
p2 = ProductFactory()
q1 = self._make_question(product=p1)
self._make_question(product=p2)
self.refresh()
req = self.client.get(reverse("search.suggest"), {"q": "emails", "product": p1.slug})
eq_([q["id"] for q in req.data["questions"]], [q1.id])
def test_locale_filter_works_for_questions(self):
q1 = self._make_question(locale="fr")
self._make_question(locale="en-US")
self.refresh()
req = self.client.get(reverse("search.suggest"), {"q": "emails", "locale": "fr"})
eq_([q["id"] for q in req.data["questions"]], [q1.id])
def test_locale_filter_works_for_documents(self):
d1 = self._make_document(slug="right-doc", locale="fr")
self._make_document(slug="wrong-doc", locale="en-US")
self.refresh()
req = self.client.get(reverse("search.suggest"), {"q": "emails", "locale": "fr"})
eq_([d["slug"] for d in req.data["documents"]], [d1.slug])
def test_serializer_fields(self):
"""Test that fields from the serializer are included."""
self._make_question()
self.refresh()
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
# Check that a field that is only available from the DB is in the response.
assert "metadata" in req.data["questions"][0]
def test_only_solved(self):
"""Test that only solved questions are suggested."""
q1 = self._make_question(solved=True)
q2 = self._make_question(solved=False)
self.refresh()
req = self.client.get(reverse("search.suggest"), {"q": "emails"})
ids = [q["id"] for q in req.data["questions"]]
assert q1.id in ids
assert q2.id not in ids
eq_(len(ids), 1)

Просмотреть файл

@ -1,58 +0,0 @@
from django.core.management import call_command
from unittest import mock
from kitsune.products.tests import ProductFactory
from kitsune.search import es_utils
from kitsune.search.tests import ElasticTestCase
from kitsune.search.utils import FakeLogger
from kitsune.wiki.tests import DocumentFactory, RevisionFactory
class ESCommandTests(ElasticTestCase):
@mock.patch.object(FakeLogger, "_out")
def test_search(self, _out):
"""Test that es_search command doesn't fail"""
call_command("essearch", "cupcakes")
p = ProductFactory(title="firefox", slug="desktop")
doc = DocumentFactory(title="cupcakes rock", locale="en-US", category=10, products=[p])
RevisionFactory(document=doc, is_approved=True)
self.refresh()
call_command("essearch", "cupcakes")
@mock.patch.object(FakeLogger, "_out")
def test_reindex(self, _out):
p = ProductFactory(title="firefox", slug="desktop")
doc = DocumentFactory(title="cupcakes rock", locale="en-US", category=10, products=[p])
RevisionFactory(document=doc, is_approved=True)
self.refresh()
call_command("esreindex")
call_command("esreindex", "--percent=50")
call_command("esreindex", "--seconds-ago=60")
call_command("esreindex", "--criticalmass")
call_command("esreindex", "--mapping_types=wiki_documents")
call_command("esreindex", "--delete")
@mock.patch.object(FakeLogger, "_out")
def test_status(self, _out):
p = ProductFactory(title="firefox", slug="desktop")
doc = DocumentFactory(title="cupcakes rock", locale="en-US", category=10, products=[p])
RevisionFactory(document=doc, is_approved=True)
self.refresh()
call_command("esstatus")
@mock.patch.object(FakeLogger, "_out")
def test_delete(self, _out):
# Note: The read indexes and the write indexes are the same in
# the tests, so we only have to do this once.
indexes = es_utils.all_read_indexes()
indexes.append("cupcakerainbow_index")
for index in indexes:
call_command("esdelete", index, noinput=True)

Просмотреть файл

@ -1,286 +0,0 @@
# -*- coding: utf-8 -*-
import json
import unittest
from django.contrib.sites.models import Site
from unittest import mock
from nose.tools import eq_
from kitsune.questions.models import QuestionMappingType
from kitsune.questions.tests import QuestionFactory, AnswerFactory, AnswerVoteFactory
from kitsune.search import es_utils
from kitsune.search.models import generate_tasks
from kitsune.search.tests import ElasticTestCase
from kitsune.sumo.urlresolvers import reverse
from kitsune.wiki.models import DocumentMappingType
from kitsune.wiki.tests import DocumentFactory, ApprovedRevisionFactory
class ElasticSearchSuggestionsTests(ElasticTestCase):
@mock.patch.object(Site.objects, "get_current")
def test_invalid_suggestions(self, get_current):
"""The suggestions API needs a query term."""
get_current.return_value.domain = "testserver"
response = self.client.get(reverse("search.suggestions", locale="en-US"))
eq_(400, response.status_code)
assert not response.content
@mock.patch.object(Site.objects, "get_current")
def test_suggestions(self, get_current):
"""Suggestions API is well-formatted."""
get_current.return_value.domain = "testserver"
doc = DocumentFactory(title="doc1 audio", locale="en-US", is_archived=False)
ApprovedRevisionFactory(document=doc, summary="audio", content="audio")
ques = QuestionFactory(title="q1 audio", tags=["desktop"])
# ques.tags.add(u'desktop')
ans = AnswerFactory(question=ques)
AnswerVoteFactory(answer=ans, helpful=True)
self.refresh()
response = self.client.get(reverse("search.suggestions", locale="en-US"), {"q": "audio"})
eq_(200, response.status_code)
eq_("application/x-suggestions+json", response["content-type"])
results = json.loads(response.content)
eq_("audio", results[0])
eq_(2, len(results[1]))
eq_(0, len(results[2]))
eq_(2, len(results[3]))
class TestUtils(ElasticTestCase):
def test_get_documents(self):
q = QuestionFactory()
self.refresh()
docs = es_utils.get_documents(QuestionMappingType, [q.id])
eq_(docs[0]["id"], q.id)
class TestTasks(ElasticTestCase):
@mock.patch.object(QuestionMappingType, "index")
def test_tasks(self, index_fun):
"""Tests to make sure tasks are added and run"""
q = QuestionFactory()
# Don't call self.refresh here since that calls generate_tasks().
eq_(index_fun.call_count, 0)
q.save()
generate_tasks()
eq_(index_fun.call_count, 1)
@mock.patch.object(QuestionMappingType, "index")
def test_tasks_squashed(self, index_fun):
"""Tests to make sure tasks are squashed"""
q = QuestionFactory()
# Don't call self.refresh here since that calls generate_tasks().
eq_(index_fun.call_count, 0)
q.save()
q.save()
q.save()
q.save()
eq_(index_fun.call_count, 0)
generate_tasks()
eq_(index_fun.call_count, 1)
class TestMappings(unittest.TestCase):
def test_mappings(self):
# This is more of a linter than a test. If it passes, then
# everything is fine. If it fails, then it means things are
# not fine. Not fine? Yeah, it means that there are two fields
# with the same name, but different types in the
# mappings that share an index. That doesn't work in ES.
# Doing it as a test seemed like a good idea since
# it's likely to catch epic problems, but isn't in the runtime
# code.
# Verify mappings that share the same index don't conflict
for index in es_utils.all_read_indexes():
merged_mapping = {}
for cls_name, mapping in list(es_utils.get_mappings(index).items()):
mapping = mapping["properties"]
for key, val in list(mapping.items()):
if key not in merged_mapping:
merged_mapping[key] = (val, [cls_name])
continue
# FIXME - We're comparing two dicts here. This might
# not work for non-trivial dicts.
if merged_mapping[key][0] != val:
raise es_utils.MappingMergeError(
"%s key different for %s and %s"
% (key, cls_name, merged_mapping[key][1])
)
merged_mapping[key][1].append(cls_name)
# If we get here, then we're fine.
class TestAnalyzers(ElasticTestCase):
def setUp(self):
super(TestAnalyzers, self).setUp()
self.locale_data = {
"en-US": {
"analyzer": "snowball-english",
"content": "I have a cat.",
},
"es": {
"analyzer": "snowball-spanish",
"content": "Tieno un gato.",
},
"ar": {
"analyzer": "arabic",
"content": "لدي اثنين من القطط",
},
"he": {
"analyzer": "standard",
"content": "גאולוגיה היא אחד",
},
}
self.docs = {}
for locale, data in list(self.locale_data.items()):
d = DocumentFactory(locale=locale)
ApprovedRevisionFactory(document=d, content=data["content"])
self.locale_data[locale]["doc"] = d
self.refresh()
def test_analyzer_choices(self):
"""Check that the indexer picked the right analyzer."""
ids = [d.id for d in list(self.docs.values())]
docs = es_utils.get_documents(DocumentMappingType, ids)
for doc in docs:
locale = doc["locale"]
eq_(doc["_analyzer"], self.locale_data[locale]["analyzer"])
def test_query_analyzer_upgrader(self):
analyzer = "snowball-english-synonyms"
before = {
"document_title__match": "foo",
"document_locale__match": "bar",
"document_title__match_phrase": "baz",
"document_locale__match_phrase": "qux",
}
expected = {
"document_title__match_analyzer": ("foo", analyzer),
"document_locale__match": "bar",
"document_title__match_phrase_analyzer": ("baz", analyzer),
"document_locale__match_phrase": "qux",
}
actual = es_utils.es_query_with_analyzer(before, "en-US")
eq_(actual, expected)
def _check_locale_tokenization(self, locale, expected_tokens, p_tag=True):
"""
Check that a given locale's document was tokenized correctly.
* `locale` - The locale to check.
* `expected_tokens` - An iterable of the tokens that should be
found. If any tokens from this list are missing, or if any
tokens not in this list are found, the check will fail.
* `p_tag` - Default True. If True, an extra token will be added
to `expected_tokens`: "p".
This is because our wiki parser wraps it's content in <p>
tags and many analyzers will tokenize a string like
'<p>Foo</p>' as ['p', 'foo'] (the HTML tag is included in
the tokenization). So this will show up in the tokenization
during this test. Not all the analyzers do this, which is
why it can be turned off.
Why can't we fix the analyzers to strip out that HTML, and not
generate spurious tokens? That could probably be done, but it
probably isn't worth while because:
* ES will weight common words lower, thanks to it's TF-IDF
algorithms, which judges words based on how often they
appear in the entire corpus and in the document, so the p
tokens will be largely ignored.
* The pre-l10n search code did it this way, so it doesn't
break search.
* When implementing l10n search, I wanted to minimize the
number of changes needed, and this seemed like an unneeded
change.
"""
search = es_utils.Sphilastic(DocumentMappingType)
search = search.filter(document_locale=locale)
facet_filter = search._process_filters([("document_locale", locale)])
search = search.facet_raw(
tokens={"terms": {"field": "document_content"}, "facet_filter": facet_filter}
)
facets = search.facet_counts()
expected = set(expected_tokens)
if p_tag:
# Since `expected` is a set, there is no problem adding this
# twice, since duplicates will be ignored.
expected.add("p")
actual = set(t["term"] for t in facets["tokens"])
eq_(actual, expected)
# These 4 languages were chosen for tokenization testing because
# they represent the 4 kinds of languages we have: English, Snowball
# supported languages, ES supported languages and languages with no
# analyzer, which use the standard analyzer. There is another
# possible case, which is a custom analyzer, but we don't have any
# of those right now.
def test_english_tokenization(self):
"""Test that English stemming and stop words work."""
self._check_locale_tokenization("en-US", ["i", "have", "cat"])
def test_spanish_tokenization(self):
"""Test that Spanish stemming and stop words work."""
self._check_locale_tokenization("es", ["tien", "un", "gat"])
def test_arabic_tokenization(self):
"""Test that Arabic stemming works.
I don't read Arabic, this is just what ES gave me when I asked
it to analyze an Arabic text as Arabic. If someone who reads
Arabic can improve this test, go for it!
"""
self._check_locale_tokenization("ar", ["لد", "اثن", "قطط"])
def test_herbrew_tokenization(self):
"""Test that Hebrew uses the standard analyzer."""
tokens = ["גאולוגיה", "היא", "אחד"]
self._check_locale_tokenization("he", tokens)
class TestGetAnalyzerForLocale(ElasticTestCase):
def test_default(self):
actual = es_utils.es_analyzer_for_locale("en-US")
eq_("snowball-english", actual)
def test_without_synonyms(self):
actual = es_utils.es_analyzer_for_locale("en-US", synonyms=False)
eq_("snowball-english", actual)
def test_with_synonyms_right_locale(self):
actual = es_utils.es_analyzer_for_locale("en-US", synonyms=True)
eq_("snowball-english-synonyms", actual)
def test_with_synonyms_wrong_locale(self):
actual = es_utils.es_analyzer_for_locale("es", synonyms=True)
eq_("snowball-spanish", actual)

Просмотреть файл

@ -13,7 +13,6 @@ class OpenSearchTestCase(TestCase):
# FIXME: This is silly. The better test would be to parse out
# the content and then go through and make sure all the urls
# were correct.
assert b"http://testserver/fr/search/suggestions" in response.content
assert b"en-US" not in response.content
def test_plugin_expires_and_mimetype(self):

Просмотреть файл

@ -1,307 +0,0 @@
import json
from django.conf import settings
from django.utils.http import urlquote
from nose.tools import eq_
from pyquery import PyQuery as pq
from kitsune.forums.tests import PostFactory, ThreadFactory
from kitsune.products.tests import ProductFactory
from kitsune.questions.tests import AnswerFactory, AnswerVoteFactory, QuestionFactory
from kitsune.search.tests.test_es import ElasticTestCase
from kitsune.sumo.tests import LocalizingClient
from kitsune.sumo.urlresolvers import reverse
from kitsune.wiki.tests import ApprovedRevisionFactory, DocumentFactory, RevisionFactory
class SimpleSearchTests(ElasticTestCase):
client_class = LocalizingClient
def test_content(self):
"""Ensure template is rendered with no errors for a common search"""
response = self.client.get(reverse("search"), {"q": "audio"})
eq_("text/html; charset=utf-8", response["Content-Type"])
eq_(200, response.status_code)
def test_search_type_param(self):
"""Ensure that invalid values for search type (a=)
does not cause errors"""
response = self.client.get(reverse("search"), {"a": "dontdie"})
eq_("text/html; charset=utf-8", response["Content-Type"])
eq_(200, response.status_code)
def test_headers(self):
"""Verify caching headers of search forms and search results"""
response = self.client.get(reverse("search"), {"q": "audio"})
eq_("max-age=%s" % (settings.SEARCH_CACHE_PERIOD * 60), response["Cache-Control"])
assert "Expires" in response
response = self.client.get(reverse("search"))
eq_("max-age=%s" % (settings.SEARCH_CACHE_PERIOD * 60), response["Cache-Control"])
assert "Expires" in response
def test_json_format(self):
"""JSON without callback should return application/json"""
response = self.client.get(
reverse("search"),
{
"q": "bookmarks",
"format": "json",
},
)
eq_(response["Content-Type"], "application/json")
def test_json_callback_validation(self):
"""Various json callbacks -- validation"""
response = self.client.get(
reverse("search"),
{
"q": "bookmarks",
"format": "json",
"callback": "callback",
},
)
eq_(response["Content-Type"], "application/x-javascript")
eq_(response.status_code, 200)
def test_page_invalid(self):
"""Ensure non-integer param doesn't throw exception."""
doc = DocumentFactory(
title="How to fix your audio", locale="en-US", category=10, tags="desktop"
)
ApprovedRevisionFactory(document=doc)
self.refresh()
response = self.client.get(
reverse("search"), {"q": "audio", "format": "json", "page": "invalid"}
)
eq_(200, response.status_code)
eq_(1, json.loads(response.content)["total"])
def test_clean_question_excerpt(self):
"""Ensure we clean html out of question excerpts."""
q = QuestionFactory(title="audio", content='<script>alert("hacked");</script>')
a = AnswerFactory(question=q)
AnswerVoteFactory(answer=a, helpful=True)
self.refresh()
response = self.client.get(reverse("search"), {"q": "audio"})
eq_(200, response.status_code)
doc = pq(response.content)
assert "script" not in doc("div.result").text()
def test_fallback_for_zero_results(self):
"""If there are no results, fallback to a list of top articles."""
firefox = ProductFactory(title="firefox", slug="desktop")
doc = DocumentFactory(title="audio1", locale="en-US", category=10, products=[firefox])
RevisionFactory(document=doc, is_approved=True)
doc = DocumentFactory(title="audio2", locale="en-US", category=10, products=[firefox])
RevisionFactory(document=doc, is_approved=True)
self.refresh()
# Verify there are no real results but 2 fallback results are rendered
response = self.client.get(reverse("search"), {"q": "piranha"})
eq_(200, response.status_code)
assert b"We couldn't find any results for" in response.content
doc = pq(response.content)
eq_(2, len(doc("#search-results .result")))
def test_meta_tags(self):
"""Tests that the search results page has the right meta tags"""
url_ = reverse("search")
response = self.client.get(url_, {"q": "contribute"})
doc = pq(response.content)
eq_(doc('meta[name="WT.oss"]')[0].attrib["content"], "contribute")
eq_(doc('meta[name="WT.oss_r"]')[0].attrib["content"], "0")
eq_(doc('meta[name="robots"]')[0].attrib["content"], "noindex")
def test_search_cookie(self):
"""Set a cookie with the latest search term."""
data = {"q": "pagap\xf3 banco"}
cookie = settings.LAST_SEARCH_COOKIE
response = self.client.get(reverse("search", locale="fr"), data)
assert cookie in response.cookies
eq_(urlquote(data["q"]), response.cookies[cookie].value)
def test_empty_pages(self):
"""Tests requesting a page that has no results"""
ques = QuestionFactory(title="audio")
ques.tags.add("desktop")
ans = AnswerFactory(question=ques, content="volume")
AnswerVoteFactory(answer=ans, helpful=True)
self.refresh()
qs = {"q": "audio", "page": 81}
response = self.client.get(reverse("search"), qs)
eq_(200, response.status_code)
def test_include_questions(self):
"""This tests whether doing a simple search returns
question results.
Bug #709202.
"""
# Create a question with an answer with an answervote that
# marks the answer as helpful. The question should have the
# "desktop" tag.
p = ProductFactory(title="firefox", slug="desktop")
ques = QuestionFactory(title="audio", product=p)
ans = AnswerFactory(question=ques, content="volume")
AnswerVoteFactory(answer=ans, helpful=True)
self.refresh()
# This is the search that you get when you start on the sumo
# homepage and do a search from the box with two differences:
# first, we do it in json since it's easier to deal with
# testing-wise and second, we search for 'audio' since we have
# data for that.
response = self.client.get(reverse("search"), {"q": "audio", "format": "json"})
eq_(200, response.status_code)
content = json.loads(response.content)
eq_(content["total"], 1)
# This is another search that picks up results based on the
# answer_content. answer_content is in a string array, so
# this makes sure that works.
response = self.client.get(reverse("search"), {"q": "volume", "format": "json"})
eq_(200, response.status_code)
content = json.loads(response.content)
eq_(content["total"], 1)
def test_include_wiki(self):
"""This tests whether doing a simple search returns wiki document
results.
Bug #709202.
"""
doc = DocumentFactory(title="audio", locale="en-US", category=10)
doc.products.add(ProductFactory(title="firefox", slug="desktop"))
RevisionFactory(document=doc, is_approved=True)
self.refresh()
# This is the search that you get when you start on the sumo
# homepage and do a search from the box with two differences:
# first, we do it in json since it's easier to deal with
# testing-wise and second, we search for 'audio' since we have
# data for that.
response = self.client.get(reverse("search"), {"q": "audio", "format": "json"})
eq_(200, response.status_code)
content = json.loads(response.content)
eq_(content["total"], 1)
def test_only_show_wiki_and_questions(self):
"""Tests that the simple search doesn't show forums
This verifies that we're only showing documents of the type
that should be shown and that the filters on model are working
correctly.
Bug #767394
"""
p = ProductFactory(slug="desktop")
ques = QuestionFactory(title="audio", product=p)
ans = AnswerFactory(question=ques, content="volume")
AnswerVoteFactory(answer=ans, helpful=True)
doc = DocumentFactory(title="audio", locale="en-US", category=10)
doc.products.add(p)
RevisionFactory(document=doc, is_approved=True)
thread1 = ThreadFactory(title="audio")
PostFactory(thread=thread1)
self.refresh()
response = self.client.get(reverse("search"), {"q": "audio", "format": "json"})
eq_(200, response.status_code)
content = json.loads(response.content)
eq_(content["total"], 2)
# Archive the article and question. They should no longer appear
# in simple search results.
ques.is_archived = True
ques.save()
doc.is_archived = True
doc.save()
self.refresh()
response = self.client.get(reverse("search"), {"q": "audio", "format": "json"})
eq_(200, response.status_code)
content = json.loads(response.content)
eq_(content["total"], 0)
def test_filter_by_product(self):
desktop = ProductFactory(slug="desktop")
mobile = ProductFactory(slug="mobile")
ques = QuestionFactory(title="audio", product=desktop)
ans = AnswerFactory(question=ques, content="volume")
AnswerVoteFactory(answer=ans, helpful=True)
doc = DocumentFactory(title="audio", locale="en-US", category=10)
doc.products.add(desktop)
doc.products.add(mobile)
RevisionFactory(document=doc, is_approved=True)
self.refresh()
# There should be 2 results for desktop and 1 for mobile.
response = self.client.get(
reverse("search"), {"q": "audio", "format": "json", "product": "desktop"}
)
eq_(200, response.status_code)
content = json.loads(response.content)
eq_(content["total"], 2)
response = self.client.get(
reverse("search"), {"q": "audio", "format": "json", "product": "mobile"}
)
eq_(200, response.status_code)
content = json.loads(response.content)
eq_(content["total"], 1)
def test_filter_by_doctype(self):
desktop = ProductFactory(slug="desktop")
ques = QuestionFactory(title="audio", product=desktop)
ans = AnswerFactory(question=ques, content="volume")
AnswerVoteFactory(answer=ans, helpful=True)
doc = DocumentFactory(title="audio", locale="en-US", category=10, products=[desktop])
RevisionFactory(document=doc, is_approved=True)
doc = DocumentFactory(title="audio too", locale="en-US", category=10, products=[desktop])
RevisionFactory(document=doc, is_approved=True)
self.refresh()
# There should be 2 results for kb (w=1) and 1 for questions (w=2).
response = self.client.get(reverse("search"), {"q": "audio", "format": "json", "w": "1"})
eq_(200, response.status_code)
content = json.loads(response.content)
eq_(content["total"], 2)
response = self.client.get(reverse("search"), {"q": "audio", "format": "json", "w": "2"})
eq_(200, response.status_code)
content = json.loads(response.content)
eq_(content["total"], 1)

Просмотреть файл

@ -1,55 +0,0 @@
from nose.tools import ok_
from kitsune.search.forms import SimpleSearchForm
from kitsune.search.search_utils import generate_simple_search
from kitsune.sumo.tests import TestCase
class SimpleSearchTests(TestCase):
def test_language_en_us(self):
form = SimpleSearchForm({"q": "foo"})
ok_(form.is_valid())
s = generate_simple_search(form, "en-US", with_highlights=False)
# NB: Comparing bits of big trees is hard, so we serialize it
# and look for strings.
s_string = str(s.build_search())
# Verify locale
ok_("{'term': {'document_locale': 'en-US'}}" in s_string)
# Verify en-US has the right synonym-enhanced analyzer
ok_("'analyzer': 'snowball-english-synonyms'" in s_string)
def test_language_fr(self):
form = SimpleSearchForm({"q": "foo"})
ok_(form.is_valid())
s = generate_simple_search(form, "fr", with_highlights=False)
s_string = str(s.build_search())
# Verify locale
ok_("{'term': {'document_locale': 'fr'}}" in s_string)
# Verify fr has right synonym-less analyzer
ok_("'analyzer': 'snowball-french'" in s_string)
def test_language_zh_cn(self):
form = SimpleSearchForm({"q": "foo"})
ok_(form.is_valid())
s = generate_simple_search(form, "zh-CN", with_highlights=False)
s_string = str(s.build_search())
# Verify locale
ok_("{'term': {'document_locale': 'zh-CN'}}" in s_string)
# Verify standard analyzer is used
ok_("'analyzer': 'chinese'" in s_string)
def test_with_highlights(self):
form = SimpleSearchForm({"q": "foo"})
ok_(form.is_valid())
s = generate_simple_search(form, "en-US", with_highlights=True)
ok_("highlight" in s.build_search())
s = generate_simple_search(form, "en-US", with_highlights=False)
ok_("highlight" not in s.build_search())

Просмотреть файл

@ -1,118 +0,0 @@
from textwrap import dedent
from nose.tools import eq_
from pyquery import PyQuery as pq
from kitsune.search import es_utils, synonym_utils
from kitsune.search.tasks import update_synonyms_task
from kitsune.search.tests import ElasticTestCase, SynonymFactory
from kitsune.sumo.tests import LocalizingClient, TestCase
from kitsune.sumo.urlresolvers import reverse
from kitsune.wiki.tests import DocumentFactory, RevisionFactory
class TestSynonymModel(TestCase):
def test_serialize(self):
syn = SynonymFactory(from_words="foo", to_words="bar")
eq_("foo => bar", str(syn))
class TestFilterGenerator(TestCase):
def test_name(self):
"""Test that the right name is returned."""
name, _ = es_utils.es_get_synonym_filter("en-US")
eq_(name, "synonyms-en-US")
def test_no_synonyms(self):
"""Test that when there are no synonyms an alternate filter is made."""
_, body = es_utils.es_get_synonym_filter("en-US")
eq_(
body,
{
"type": "synonym",
"synonyms": ["firefox => firefox"],
},
)
def test_with_some_synonyms(self):
SynonymFactory(from_words="foo", to_words="bar")
SynonymFactory(from_words="baz", to_words="qux")
_, body = es_utils.es_get_synonym_filter("en-US")
expected = {
"type": "synonym",
"synonyms": [
"foo => bar",
"baz => qux",
],
}
eq_(body, expected)
class TestSynonymParser(TestCase):
def testItWorks(self):
synonym_text = dedent(
"""
one, two => apple, banana
three => orange, grape
four, five => jellybean
"""
)
synonyms = {
("one, two", "apple, banana"),
("three", "orange, grape"),
("four, five", "jellybean"),
}
eq_(synonyms, synonym_utils.parse_synonyms(synonym_text))
def testTooManyArrows(self):
try:
synonym_utils.parse_synonyms("foo => bar => baz")
except synonym_utils.SynonymParseError as e:
eq_(len(e.errors), 1)
else:
assert False, "Parser did not catch error as expected."
def testTooFewArrows(self):
try:
synonym_utils.parse_synonyms("foo, bar, baz")
except synonym_utils.SynonymParseError as e:
eq_(len(e.errors), 1)
else:
assert False, "Parser did not catch error as expected."
class SearchViewWithSynonyms(ElasticTestCase):
client_class = LocalizingClient
def test_synonyms_work_in_search_view(self):
d1 = DocumentFactory(title="frob")
d2 = DocumentFactory(title="glork")
RevisionFactory(document=d1, is_approved=True)
RevisionFactory(document=d2, is_approved=True)
self.refresh()
# First search without synonyms
response = self.client.get(reverse("search"), {"q": "frob"})
doc = pq(response.content)
header = doc.find("#search-results h2").text().strip()
eq_(header, "Found 1 result for frob for All Products")
# Now add a synonym.
SynonymFactory(from_words="frob", to_words="frob, glork")
update_synonyms_task()
self.refresh()
# Forward search
response = self.client.get(reverse("search"), {"q": "frob"})
doc = pq(response.content)
header = doc.find("#search-results h2").text().strip()
eq_(header, "Found 2 results for frob for All Products")
# Reverse search
response = self.client.get(reverse("search"), {"q": "glork"})
doc = pq(response.content)
header = doc.find("#search-results h2").text().strip()
eq_(header, "Found 1 result for glork for All Products")

Просмотреть файл

@ -1,36 +0,0 @@
from nose.tools import eq_
from kitsune.search.utils import chunked, from_class_path, to_class_path
from kitsune.sumo.tests import TestCase
class ChunkedTests(TestCase):
def test_chunked(self):
# chunking nothing yields nothing.
eq_(list(chunked([], 1)), [])
# chunking list where len(list) < n
eq_(list(chunked([1], 10)), [(1,)])
# chunking a list where len(list) == n
eq_(list(chunked([1, 2], 2)), [(1, 2)])
# chunking list where len(list) > n
eq_(list(chunked([1, 2, 3, 4, 5], 2)), [(1, 2), (3, 4), (5,)])
class FooBarClassOfAwesome(object):
pass
def test_from_class_path():
eq_(
from_class_path("kitsune.search.tests.test_utils:FooBarClassOfAwesome"),
FooBarClassOfAwesome,
)
def test_to_class_path():
eq_(
to_class_path(FooBarClassOfAwesome), "kitsune.search.tests.test_utils:FooBarClassOfAwesome"
)

Просмотреть файл

@ -6,5 +6,4 @@ from kitsune.search.v2 import views as v2_views
urlpatterns = [
url(r"^$", v2_views.simple_search, name="search"),
url(r"^/xml$", views.opensearch_plugin, name="search.plugin"),
url(r"^/suggestions$", views.opensearch_suggestions, name="search.suggestions"),
]

Просмотреть файл

@ -1,8 +0,0 @@
from django.conf.urls import url
from kitsune.search import api
# API urls. Prefixed with /api/2/
urlpatterns = [
url("^search/suggest/$", api.suggest, name="search.suggest"),
]

Просмотреть файл

@ -1,43 +1,8 @@
import time
from itertools import islice
from django.conf import settings
import bleach
from kitsune.lib.sumo_locales import LOCALES
class FakeLogger(object):
"""Fake logger that we can pretend is a Python Logger
Why? Well, because Django has logging settings that prevent me
from setting up a logger here that uses the stdout that the Django
BaseCommand has. At some point p while fiddling with it, I
figured, 'screw it--I'll just write my own' and did.
The minor ramification is that this isn't a complete
implementation so if it's missing stuff, we'll have to add it.
"""
def __init__(self, stdout):
self.stdout = stdout
def _out(self, level, msg, *args):
msg = msg % args
self.stdout.write("%s %-8s: %s\n" % (time.strftime("%H:%M:%S"), level, msg))
def info(self, msg, *args):
self._out("INFO", msg, *args)
def error(self, msg, *args):
self._out("ERROR", msg, *args)
def clean_excerpt(excerpt):
return bleach.clean(excerpt, tags=["b", "i"])
def locale_or_default(locale):
"""Return `locale` or, if `locale` isn't a known locale, a default.
@ -47,57 +12,3 @@ def locale_or_default(locale):
if locale not in LOCALES:
locale = settings.LANGUAGE_CODE
return locale
def chunked(iterable, n):
"""Returns chunks of n length of iterable
If len(iterable) % n != 0, then the last chunk will have length
less than n.
Example:
>>> chunked([1, 2, 3, 4, 5], 2)
[(1, 2), (3, 4), (5,)]
"""
iterable = iter(iterable)
while True:
t = tuple(islice(iterable, n))
if t:
yield t
else:
return
def to_class_path(cls):
"""Returns class path for a class
Takes a class and returns the class path which is composed of the
module plus the class name. This can be reversed later to get the
class using ``from_class_path``.
:returns: string
>>> from kitsune.search.models import Record
>>> to_class_path(Record)
'kitsune.search.models:Record'
"""
return ":".join([cls.__module__, cls.__name__])
def from_class_path(cls_path):
"""Returns the class
Takes a class path and returns the class for it.
:returns: varies
>>> from_class_path('kitsune.search.models:Record')
<Record ...>
"""
module_path, cls_name = cls_path.split(":")
module = __import__(module_path, fromlist=[cls_name])
return getattr(module, cls_name)

Просмотреть файл

@ -1,32 +1,15 @@
import json
import logging
from datetime import datetime, timedelta
from itertools import chain
import bleach
import jinja2
from django.http import HttpResponse, HttpResponseBadRequest
from django.shortcuts import render_to_response
from django.utils.html import escape
from django.utils.translation import pgettext_lazy
from django.utils.translation import ugettext as _
from django.views.decorators.cache import cache_page
from elasticutils.contrib.django import ES_EXCEPTIONS
from elasticutils.utils import format_explanation
from kitsune import search as constants
from kitsune.products.models import Product
from kitsune.search.forms import SimpleSearchForm
from kitsune.search.search_utils import generate_simple_search
from kitsune.search.utils import clean_excerpt, locale_or_default
from kitsune.wiki.facets import documents_for
log = logging.getLogger("k.search")
EXCERPT_JOINER = pgettext_lazy("between search excerpts", "...")
def cache_control(resp, cache_period):
"""Inserts cache/expires headers"""
resp["Cache-Control"] = "max-age=%s" % (cache_period * 60)
@ -36,110 +19,6 @@ def cache_control(resp, cache_period):
return resp
def _es_down_template(request, *args, **kwargs):
"""Returns the appropriate "Elasticsearch is down!" template"""
return "search/down.html"
class UnknownDocType(Exception):
"""Signifies a doctype for which there's no handling"""
pass
def build_results_list(pages, is_json):
"""Takes a paginated search and returns results List
Handles wiki documents, questions and contributor forum posts.
:arg pages: paginated S
:arg is_json: whether or not this is generated results for json output
:returns: list of dicts
"""
results = []
for rank, doc in enumerate(pages, pages.start_index()):
if doc["model"] == "wiki_document":
summary = _build_es_excerpt(doc)
if not summary:
summary = doc["document_summary"]
result = {"title": doc["document_title"], "type": "document"}
elif doc["model"] == "questions_question":
summary = _build_es_excerpt(doc)
if not summary:
# We're excerpting only question_content, so if the query matched
# question_title or question_answer_content, then there won't be any
# question_content excerpts. In that case, just show the question--but
# only the first 500 characters.
summary = bleach.clean(doc["question_content"], strip=True)[:500]
result = {
"title": doc["question_title"],
"type": "question",
"last_updated": datetime.fromtimestamp(doc["updated"]),
"is_solved": doc["question_is_solved"],
"num_answers": doc["question_num_answers"],
"num_votes": doc["question_num_votes"],
"num_votes_past_week": doc["question_num_votes_past_week"],
}
elif doc["model"] == "forums_thread":
summary = _build_es_excerpt(doc, first_only=True)
result = {"title": doc["post_title"], "type": "thread"}
else:
raise UnknownDocType("%s is an unknown doctype" % doc["model"])
result["url"] = doc["url"]
if not is_json:
result["object"] = doc
result["search_summary"] = summary
result["rank"] = rank
result["score"] = doc.es_meta.score
result["explanation"] = escape(format_explanation(doc.es_meta.explanation))
result["id"] = doc["id"]
results.append(result)
return results
@cache_page(60 * 15) # 15 minutes.
def opensearch_suggestions(request):
"""A simple search view that returns OpenSearch suggestions."""
content_type = "application/x-suggestions+json"
search_form = SimpleSearchForm(request.GET, auto_id=False)
if not search_form.is_valid():
return HttpResponseBadRequest(content_type=content_type)
cleaned = search_form.cleaned_data
language = locale_or_default(cleaned["language"] or request.LANGUAGE_CODE)
searcher = generate_simple_search(search_form, language, with_highlights=False)
searcher = searcher.values_dict("document_title", "question_title", "url")
results = searcher[:10]
def urlize(r):
return "%s://%s%s" % (
"https" if request.is_secure() else "http",
request.get_host(),
r["url"][0],
)
def titleize(r):
# NB: Elasticsearch returns an array of strings as the value, so we mimic that and
# then pull out the first (and only) string.
return r.get("document_title", r.get("question_title", [_("No title")]))[0]
try:
data = [cleaned["q"], [titleize(r) for r in results], [], [urlize(r) for r in results]]
except ES_EXCEPTIONS:
# If we have Elasticsearch problems, we just send back an empty set of results.
data = []
return HttpResponse(json.dumps(data), content_type=content_type)
@cache_page(60 * 60 * 168) # 1 week.
def opensearch_plugin(request):
"""Render an OpenSearch Plugin."""
@ -156,33 +35,6 @@ def opensearch_plugin(request):
)
def _ternary_filter(ternary_value):
"""Return a search query given a TERNARY_YES or TERNARY_NO.
Behavior for TERNARY_OFF is undefined.
"""
return ternary_value == constants.TERNARY_YES
def _build_es_excerpt(result, first_only=False):
"""Return concatenated search excerpts.
:arg result: The result object from the queryset results
:arg first_only: True if we should show only the first bit, False
if we should show all bits
"""
bits = [m.strip() for m in chain(*list(result.es_meta.highlight.values()))]
if first_only and bits:
excerpt = bits[0]
else:
excerpt = EXCERPT_JOINER.join(bits)
return jinja2.Markup(clean_excerpt(excerpt))
def _fallback_results(locale, product_slugs):
"""Return the top 20 articles by votes for the given product(s)."""
products = []

Просмотреть файл

@ -1,31 +1,26 @@
# -*- coding: utf-8 -*-
import inspect
import os
import subprocess
import sys
from functools import wraps
from os import getenv
from smtplib import SMTPRecipientsRefused
import subprocess
import django_nose
import factory.fuzzy
from django.conf import settings
from django.core.cache import cache
from django.test import TestCase as OriginalTestCase
from django.test.client import Client
from django.test.utils import override_settings
from django.utils.translation import trans_real
import django_nose
import factory.fuzzy
from elasticutils.contrib.django import get_es
from nose.tools import eq_
from pyquery import PyQuery
from waffle.models import Flag
from kitsune.search import es_utils
from kitsune.search.models import generate_tasks
from kitsune.sumo.urlresolvers import reverse, split_path
# We do this gooftastic thing because nose uses unittest.SkipTest in
# Python 2.7 which doesn't work with the whole --no-skip thing.
# TODO: CHeck this after the upgrade
@ -76,63 +71,18 @@ class TestCase(OriginalTestCase):
trans_real.activate(settings.LANGUAGE_CODE)
super(TestCase, self)._pre_setup()
def reindex_and_refresh(self):
"""Reindexes anything in the db"""
from kitsune.search.es_utils import es_reindex_cmd
es_reindex_cmd()
self.refresh(run_tasks=False)
def setup_indexes(self, empty=False, wait=True):
"""(Re-)create write index"""
from kitsune.search.es_utils import recreate_indexes
recreate_indexes()
get_es().cluster.health(wait_for_status="yellow")
def teardown_indexes(self):
"""Tear down write index"""
for index in es_utils.all_write_indexes():
es_utils.delete_index(index)
@classmethod
def setUpClass(cls):
super(TestCase, cls).setUpClass()
if not getattr(settings, "ES_URLS"):
cls.skipme = True
return
# try to connect to ES and if it fails, skip ElasticTestCases.
if not get_es().ping():
cls.skipme = True
return
def setUp(self):
if self.skipme:
raise SkipTest
super(TestCase, self).setUp()
self.setup_indexes()
def tearDown(self):
super(TestCase, self).tearDown()
self.teardown_indexes()
def refresh(self, run_tasks=True):
es = get_es()
if run_tasks:
# Any time we're doing a refresh, we're making sure that
# the index is ready to be queried. Given that, it's
# almost always the case that we want to run all the
# generated tasks, then refresh.
generate_tasks()
for index in es_utils.all_write_indexes():
es.indices.refresh(index=index)
es.cluster.health(wait_for_status="yellow")
def attrs_eq(received, **expected):

Просмотреть файл

@ -56,7 +56,6 @@ urlpatterns = [
# v2 APIs
url(r"^api/2/", include("kitsune.notifications.urls_api")),
url(r"^api/2/", include("kitsune.questions.urls_api")),
url(r"^api/2/", include("kitsune.search.urls_api")),
url(r"^api/2/", include("kitsune.sumo.urls_api")),
# These API urls include both v1 and v2 urls.
url(r"^api/", include("kitsune.users.urls_api")),

Просмотреть файл

@ -1,12 +0,0 @@
from django.core.management.base import BaseCommand
from kitsune.search.tasks import index_task
from kitsune.search.utils import to_class_path
from kitsune.wiki.models import DocumentMappingType
class Command(BaseCommand):
help = "Reindex wiki_document."
def handle(self, **options):
index_task.delay(to_class_path(DocumentMappingType), DocumentMappingType.get_indexable())

Просмотреть файл

@ -1,6 +1,5 @@
import hashlib
import logging
import time
from datetime import datetime, timedelta
from urllib.parse import urlparse
@ -21,13 +20,6 @@ from tidings.models import NotificationsMixin
from kitsune.gallery.models import Image
from kitsune.products.models import Product, Topic
from kitsune.search.es_utils import UnindexMeBro, es_analyzer_for_locale
from kitsune.search.models import (
SearchMappingType,
SearchMixin,
register_for_indexing,
register_mapping_type,
)
from kitsune.sumo.apps import ProgrammingError
from kitsune.sumo.models import LocaleField, ModelBase
from kitsune.sumo.urlresolvers import reverse, split_path
@ -51,6 +43,7 @@ from kitsune.wiki.config import (
from kitsune.wiki.permissions import DocumentPermissionMixin
log = logging.getLogger("k.wiki")
MAX_REVISION_COMMENT_LENGTH = 255
class TitleCollision(Exception):
@ -65,9 +58,7 @@ class _NotDocumentView(Exception):
"""A URL not pointing to the document view was passed to from_url()."""
class Document(
NotificationsMixin, ModelBase, BigVocabTaggableMixin, SearchMixin, DocumentPermissionMixin
):
class Document(NotificationsMixin, ModelBase, BigVocabTaggableMixin, DocumentPermissionMixin):
"""A localized knowledgebase document, not revision-specific."""
title = models.CharField(max_length=255, db_index=True)
@ -676,10 +667,6 @@ class Document(
revision__document=self, created__gt=start, helpful=True
).count()
@classmethod
def get_mapping_type(cls):
return DocumentMappingType
def parse_and_calculate_links(self):
"""Calculate What Links Here data for links going out from this.
@ -736,158 +723,6 @@ class Document(
cache.delete(doc_html_cache_key(self.locale, self.slug))
@register_mapping_type
class DocumentMappingType(SearchMappingType):
seconds_ago_filter = "current_revision__created__gte"
list_keys = ["topic", "product"]
@classmethod
def get_model(cls):
return Document
@classmethod
def get_query_fields(cls):
return ["document_title", "document_content", "document_summary", "document_keywords"]
@classmethod
def get_localized_fields(cls):
# This is the same list as `get_query_fields`, but it doesn't
# have to be, which is why it is typed twice.
return ["document_title", "document_content", "document_summary", "document_keywords"]
@classmethod
def get_mapping(cls):
return {
"properties": {
# General fields
"id": {"type": "long"},
"model": {"type": "string", "index": "not_analyzed"},
"url": {"type": "string", "index": "not_analyzed"},
"indexed_on": {"type": "integer"},
"updated": {"type": "integer"},
"product": {"type": "string", "index": "not_analyzed"},
"topic": {"type": "string", "index": "not_analyzed"},
# Document specific fields (locale aware)
"document_title": {"type": "string", "analyzer": "snowball"},
"document_keywords": {"type": "string", "analyzer": "snowball"},
"document_content": {
"type": "string",
"store": "yes",
"analyzer": "snowball",
"term_vector": "with_positions_offsets",
},
"document_summary": {
"type": "string",
"store": "yes",
"analyzer": "snowball",
"term_vector": "with_positions_offsets",
},
# Document specific fields (locale naive)
"document_locale": {"type": "string", "index": "not_analyzed"},
"document_current_id": {"type": "integer"},
"document_parent_id": {"type": "integer"},
"document_category": {"type": "integer"},
"document_slug": {"type": "string", "index": "not_analyzed"},
"document_is_archived": {"type": "boolean"},
"document_recent_helpful_votes": {"type": "integer"},
"document_display_order": {"type": "integer"},
}
}
@classmethod
def extract_document(cls, obj_id, obj=None):
if obj is None:
model = cls.get_model()
obj = model.objects.select_related("current_revision", "parent").get(pk=obj_id)
if obj.html.startswith(REDIRECT_HTML):
# It's possible this document is indexed and was turned
# into a redirect, so now we want to explicitly unindex
# it. The way we do that is by throwing an exception
# which gets handled by the indexing machinery.
raise UnindexMeBro()
d = {}
d["id"] = obj.id
d["model"] = cls.get_mapping_type_name()
d["url"] = obj.get_absolute_url()
d["indexed_on"] = int(time.time())
d["topic"] = [t.slug for t in obj.get_topics()]
d["product"] = [p.slug for p in obj.get_products()]
d["document_title"] = obj.title
d["document_locale"] = obj.locale
d["document_parent_id"] = obj.parent.id if obj.parent else None
d["document_content"] = obj.html
d["document_category"] = obj.category
d["document_slug"] = obj.slug
d["document_is_archived"] = obj.is_archived
d["document_display_order"] = obj.original.display_order
d["document_summary"] = obj.summary
if obj.current_revision is not None:
d["document_keywords"] = obj.current_revision.keywords
d["updated"] = int(time.mktime(obj.current_revision.created.timetuple()))
d["document_current_id"] = obj.current_revision.id
d["document_recent_helpful_votes"] = obj.recent_helpful_votes
else:
d["document_summary"] = None
d["document_keywords"] = None
d["updated"] = None
d["document_current_id"] = None
d["document_recent_helpful_votes"] = 0
# Don't query for helpful votes if the document doesn't have a current
# revision, or is a template, or is a redirect, or is in Navigation
# category (50).
if (
obj.current_revision
and not obj.is_template
and not obj.html.startswith(REDIRECT_HTML)
and not obj.category == 50
):
d["document_recent_helpful_votes"] = obj.recent_helpful_votes
else:
d["document_recent_helpful_votes"] = 0
# Select a locale-appropriate default analyzer for all strings.
d["_analyzer"] = es_analyzer_for_locale(obj.locale)
return d
@classmethod
def get_indexable(cls, seconds_ago=0):
# This function returns all the indexable things, but we
# really need to handle the case where something was indexable
# and isn't anymore. Given that, this returns everything that
# has a revision.
indexable = super(cls, cls).get_indexable(seconds_ago=seconds_ago)
indexable = indexable.filter(current_revision__isnull=False)
return indexable
@classmethod
def index(cls, document, **kwargs):
# If there are no revisions or the current revision is a
# redirect, we want to remove it from the index.
if document["document_current_id"] is None or document["document_content"].startswith(
REDIRECT_HTML
):
cls.unindex(document["id"], es=kwargs.get("es", None))
return
super(cls, cls).index(document, **kwargs)
register_for_indexing("wiki", Document)
register_for_indexing("wiki", Document.topics.through, m2m=True)
register_for_indexing("wiki", Document.products.through, m2m=True)
MAX_REVISION_COMMENT_LENGTH = 255
class AbstractRevision(models.Model):
# **%(class)s** is being used because it will allow a unique reverse name for the field
# like created_revisions and created_draftrevisions
@ -904,7 +739,7 @@ class AbstractRevision(models.Model):
abstract = True
class Revision(ModelBase, SearchMixin, AbstractRevision):
class Revision(ModelBase, AbstractRevision):
"""A revision of a localized knowledgebase document"""
summary = models.TextField() # wiki markup
@ -1135,12 +970,8 @@ class Revision(ModelBase, SearchMixin, AbstractRevision):
except IndexError:
return None
@classmethod
def get_mapping_type(cls):
return RevisionMetricsMappingType
class DraftRevision(ModelBase, SearchMixin, AbstractRevision):
class DraftRevision(ModelBase, AbstractRevision):
based_on = models.ForeignKey(Revision, on_delete=models.CASCADE)
content = models.TextField(blank=True)
locale = LocaleField(blank=False, db_index=True)
@ -1149,91 +980,6 @@ class DraftRevision(ModelBase, SearchMixin, AbstractRevision):
title = models.CharField(max_length=255, blank=True)
@register_mapping_type
class RevisionMetricsMappingType(SearchMappingType):
seconds_ago_filter = "created__gte"
@classmethod
def get_model(cls):
return Revision
@classmethod
def get_index_group(cls):
return "metrics"
@classmethod
def get_mapping(cls):
return {
"properties": {
"id": {"type": "long"},
"model": {"type": "string", "index": "not_analyzed"},
"url": {"type": "string", "index": "not_analyzed"},
"indexed_on": {"type": "integer"},
"created": {"type": "date"},
"reviewed": {"type": "date"},
"locale": {"type": "string", "index": "not_analyzed"},
"product": {"type": "string", "index": "not_analyzed"},
"is_approved": {"type": "boolean"},
"creator_id": {"type": "long"},
"reviewer_id": {"type": "long"},
}
}
@classmethod
def extract_document(cls, obj_id, obj=None):
"""Extracts indexable attributes from an Answer."""
fields = [
"id",
"created",
"creator_id",
"reviewed",
"reviewer_id",
"is_approved",
"document_id",
]
composed_fields = ["document__locale", "document__slug"]
all_fields = fields + composed_fields
if obj is None:
model = cls.get_model()
obj_dict = model.objects.values(*all_fields).get(pk=obj_id)
else:
obj_dict = dict([(field, getattr(obj, field)) for field in fields])
obj_dict["document__locale"] = obj.document.locale
obj_dict["document__slug"] = obj.document.slug
d = {}
d["id"] = obj_dict["id"]
d["model"] = cls.get_mapping_type_name()
# We do this because get_absolute_url is an instance method
# and we don't want to create an instance because it's a DB
# hit and expensive. So we do it by hand. get_absolute_url
# doesn't change much, so this is probably ok.
d["url"] = reverse(
"wiki.revision",
kwargs={"revision_id": obj_dict["id"], "document_slug": obj_dict["document__slug"]},
)
d["indexed_on"] = int(time.time())
d["created"] = obj_dict["created"]
d["reviewed"] = obj_dict["reviewed"]
d["locale"] = obj_dict["document__locale"]
d["is_approved"] = obj_dict["is_approved"]
d["creator_id"] = obj_dict["creator_id"]
d["reviewer_id"] = obj_dict["reviewer_id"]
doc = Document.objects.get(id=obj_dict["document_id"])
d["product"] = [p.slug for p in doc.get_products()]
return d
register_for_indexing("revisions", Revision)
class HelpfulVote(ModelBase):
"""Helpful or Not Helpful vote on Revision."""

Просмотреть файл

@ -1,195 +0,0 @@
from datetime import datetime, timedelta
from nose.tools import eq_
from kitsune.products.tests import ProductFactory, TopicFactory
from kitsune.search.tests.test_es import ElasticTestCase
from kitsune.wiki.tests import (
DocumentFactory,
RevisionFactory,
HelpfulVoteFactory,
RedirectRevisionFactory,
)
from kitsune.wiki.models import DocumentMappingType, RevisionMetricsMappingType
class DocumentUpdateTests(ElasticTestCase):
def test_add_and_delete(self):
"""Adding a doc should add it to the search index; deleting should
delete it."""
doc = DocumentFactory()
RevisionFactory(document=doc, is_approved=True)
self.refresh()
eq_(DocumentMappingType.search().count(), 1)
doc.delete()
self.refresh()
eq_(DocumentMappingType.search().count(), 0)
def test_translations_get_parent_tags(self):
t1 = TopicFactory(display_order=1)
t2 = TopicFactory(display_order=2)
p = ProductFactory()
doc1 = DocumentFactory(title="Audio too loud", products=[p], topics=[t1, t2])
RevisionFactory(document=doc1, is_approved=True)
doc2 = DocumentFactory(title="Audio too loud bork bork", parent=doc1, tags=["badtag"])
RevisionFactory(document=doc2, is_approved=True)
# Verify the parent has the right tags.
doc_dict = DocumentMappingType.extract_document(doc1.id)
eq_(sorted(doc_dict["topic"]), sorted([t1.slug, t2.slug]))
eq_(doc_dict["product"], [p.slug])
# Verify the translation has the parent's tags.
doc_dict = DocumentMappingType.extract_document(doc2.id)
eq_(sorted(doc_dict["topic"]), sorted([t1.slug, t2.slug]))
eq_(doc_dict["product"], [p.slug])
def test_wiki_topics(self):
"""Make sure that adding topics to a Document causes it to
refresh the index.
"""
t = TopicFactory(slug="hiphop")
eq_(DocumentMappingType.search().filter(topic=t.slug).count(), 0)
doc = DocumentFactory()
RevisionFactory(document=doc, is_approved=True)
self.refresh()
eq_(DocumentMappingType.search().filter(topic=t.slug).count(), 0)
doc.topics.add(t)
self.refresh()
eq_(DocumentMappingType.search().filter(topic=t.slug).count(), 1)
doc.topics.clear()
self.refresh()
# Make sure the document itself is still there and that we didn't
# accidentally delete it through screwed up signal handling:
eq_(DocumentMappingType.search().filter().count(), 1)
eq_(DocumentMappingType.search().filter(topic=t.slug).count(), 0)
def test_wiki_products(self):
"""Make sure that adding products to a Document causes it to
refresh the index.
"""
p = ProductFactory(slug="desktop")
eq_(DocumentMappingType.search().filter(product=p.slug).count(), 0)
doc = DocumentFactory()
RevisionFactory(document=doc, is_approved=True)
self.refresh()
eq_(DocumentMappingType.search().filter(product=p.slug).count(), 0)
doc.products.add(p)
self.refresh()
eq_(DocumentMappingType.search().filter(product=p.slug).count(), 1)
doc.products.remove(p)
self.refresh()
# Make sure the document itself is still there and that we didn't
# accidentally delete it through screwed up signal handling:
eq_(DocumentMappingType.search().filter().count(), 1)
eq_(DocumentMappingType.search().filter(product=p.slug).count(), 0)
def test_wiki_no_revisions(self):
"""Don't index documents without approved revisions"""
# Create a document with no revisions and make sure the
# document is not in the index.
doc = DocumentFactory()
self.refresh()
eq_(DocumentMappingType.search().count(), 0)
# Create a revision that's not approved and make sure the
# document is still not in the index.
RevisionFactory(document=doc, is_approved=False)
self.refresh()
eq_(DocumentMappingType.search().count(), 0)
def test_wiki_redirects(self):
"""Make sure we don't index redirects"""
# First create a revision that doesn't have a redirect and
# make sure it's in the index.
doc = DocumentFactory(title="wool hats")
RevisionFactory(document=doc, is_approved=True)
self.refresh()
eq_(DocumentMappingType.search().query(document_title__match="wool").count(), 1)
# Now create a revision that is a redirect and make sure the
# document is removed from the index.
RedirectRevisionFactory(document=doc)
self.refresh()
eq_(DocumentMappingType.search().query(document_title__match="wool").count(), 0)
def test_wiki_keywords(self):
"""Make sure updating keywords updates the index."""
# Create a document with a revision with no keywords. It
# shouldn't show up with a document_keywords term query for
# 'wool' since it has no keywords.
doc = DocumentFactory(title="wool hats")
RevisionFactory(document=doc, is_approved=True)
self.refresh()
eq_(DocumentMappingType.search().query(document_keywords="wool").count(), 0)
RevisionFactory(document=doc, is_approved=True, keywords="wool")
self.refresh()
eq_(DocumentMappingType.search().query(document_keywords="wool").count(), 1)
def test_recent_helpful_votes(self):
"""Recent helpful votes are indexed properly."""
# Create a document and verify it doesn't show up in a
# query for recent_helpful_votes__gt=0.
r = RevisionFactory(is_approved=True)
self.refresh()
eq_(DocumentMappingType.search().filter(document_recent_helpful_votes__gt=0).count(), 0)
# Add an unhelpful vote, it still shouldn't show up.
HelpfulVoteFactory(revision=r, helpful=False)
r.document.save() # Votes don't trigger a reindex.
self.refresh()
eq_(DocumentMappingType.search().filter(document_recent_helpful_votes__gt=0).count(), 0)
# Add an helpful vote created 31 days ago, it still shouldn't show up.
created = datetime.now() - timedelta(days=31)
HelpfulVoteFactory(revision=r, helpful=True, created=created)
r.document.save() # Votes don't trigger a reindex.
self.refresh()
eq_(DocumentMappingType.search().filter(document_recent_helpful_votes__gt=0).count(), 0)
# Add an helpful vote created 29 days ago, it should show up now.
created = datetime.now() - timedelta(days=29)
HelpfulVoteFactory(revision=r, helpful=True, created=created)
r.document.save() # Votes don't trigger a reindex.
self.refresh()
eq_(DocumentMappingType.search().filter(document_recent_helpful_votes__gt=0).count(), 1)
class RevisionMetricsTests(ElasticTestCase):
def test_add_and_delete(self):
"""Adding a revision should add it to the index.
Deleting should delete it.
"""
r = RevisionFactory()
self.refresh()
eq_(RevisionMetricsMappingType.search().count(), 1)
r.delete()
self.refresh()
eq_(RevisionMetricsMappingType.search().count(), 0)
def test_data_in_index(self):
"""Verify the data we are indexing."""
p = ProductFactory()
base_doc = DocumentFactory(locale="en-US", products=[p])
d = DocumentFactory(locale="es", parent=base_doc)
r = RevisionFactory(document=d, is_approved=True)
self.refresh()
eq_(RevisionMetricsMappingType.search().count(), 1)
data = RevisionMetricsMappingType.search()[0]
eq_(data["is_approved"], r.is_approved)
eq_(data["locale"], d.locale)
eq_(data["product"], [p.slug])
eq_(data["creator_id"], r.creator_id)