2010-05-25 21:27:57 +04:00
|
|
|
======
|
|
|
|
Celery
|
|
|
|
======
|
|
|
|
|
|
|
|
`Celery <http://celeryproject.org/>`_ is a task queue powered by RabbitMQ. You
|
|
|
|
can use it for anything that doesn't need to complete in the current
|
|
|
|
request-response cycle. Or use it `wherever Les tells you to use it
|
|
|
|
<http://decafbad.com/blog/2008/07/04/queue-everything-and-delight-everyone>`_.
|
|
|
|
|
|
|
|
For example, each addon has a ``current_version`` cached property. This query
|
|
|
|
on initial run causes strain on our database. We can create a denormalized
|
|
|
|
database field called ``current_version`` on the ``addons`` table.
|
|
|
|
|
|
|
|
We'll need to populate regularly so it has fairly up-to-date data. We can do
|
|
|
|
this in a process outside the request-response cycle. This is where Celery
|
|
|
|
comes in.
|
|
|
|
|
|
|
|
Installation
|
|
|
|
------------
|
|
|
|
|
|
|
|
RabbitMQ
|
|
|
|
~~~~~~~~
|
|
|
|
|
|
|
|
Celery depends on RabbitMQ. If you use ``homebrew`` you can install this:
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
brew install rabbitmq
|
|
|
|
|
|
|
|
Setting up rabbitmq invovles some configuration. You may want to define the
|
|
|
|
following ::
|
|
|
|
|
2010-11-03 23:26:19 +03:00
|
|
|
# On a Mac, you can find this in System Preferences > Sharing
|
|
|
|
export HOSTNAME='<laptop name>.local'
|
2010-05-25 21:27:57 +04:00
|
|
|
|
|
|
|
Then run the following commands: ::
|
|
|
|
|
|
|
|
# Set your host up so it's semi-permanent
|
|
|
|
sudo scutil --set HostName $HOSTNAME
|
2010-11-17 03:53:49 +03:00
|
|
|
|
|
|
|
# Update your hosts by either:
|
|
|
|
# 1) Manually editing /etc/hosts
|
|
|
|
# 2) `echo 127.0.0.1 $HOSTNAME >> /etc/hosts`
|
2010-05-25 21:27:57 +04:00
|
|
|
|
|
|
|
# RabbitMQ insists on writing to /var
|
|
|
|
sudo rabbitmq-server -detached
|
|
|
|
|
2011-04-27 21:26:17 +04:00
|
|
|
# Setup rabitty things (sudo is required to read the cookie file)
|
|
|
|
sudo rabbitmqctl add_user zamboni zamboni
|
|
|
|
sudo rabbitmqctl add_vhost zamboni
|
|
|
|
sudo rabbitmqctl set_permissions -p zamboni zamboni ".*" ".*" ".*"
|
2010-05-25 21:27:57 +04:00
|
|
|
|
|
|
|
Back in safe and happy django-land you should be able to run: ::
|
|
|
|
|
|
|
|
./manage.py celeryd $OPTIONS
|
|
|
|
|
|
|
|
Celery understands python and any tasks that you have defined in your app are
|
|
|
|
now runnable asynchronously.
|
|
|
|
|
|
|
|
Celery Tasks
|
|
|
|
------------
|
|
|
|
|
|
|
|
Any python function can be set as a celery task. For example, let's say we want
|
|
|
|
to update our ``current_version`` but we don't care how quickly it happens, just
|
|
|
|
that it happens. We can define it like so: ::
|
|
|
|
|
|
|
|
@task(rate_limit='2/m')
|
|
|
|
def _update_addons_current_version(data, **kw):
|
|
|
|
task_log.debug("[%s@%s] Updating addons current_versions." %
|
|
|
|
(len(data), _update_addons_current_version.rate_limit))
|
|
|
|
for pk in data:
|
|
|
|
try:
|
|
|
|
addon = Addon.objects.get(pk=pk[0])
|
|
|
|
addon.update_current_version()
|
|
|
|
except Addon.DoesNotExist:
|
|
|
|
task_log.debug("Missing addon: %d" % pk)
|
|
|
|
|
|
|
|
``@task`` is a decorator for Celery to find our tasks. We can specify a
|
|
|
|
``rate_limit`` like ``2/m`` which means ``celeryd`` will only run this command
|
|
|
|
2 times a minute at most. This keeps write-heavy tasks from killing your
|
|
|
|
database.
|
|
|
|
|
|
|
|
If we run this command like so: ::
|
|
|
|
|
2011-06-13 22:24:12 +04:00
|
|
|
from celery.task.sets import TaskSet
|
2010-05-25 21:27:57 +04:00
|
|
|
|
2011-06-13 22:24:12 +04:00
|
|
|
ts = [_update_addon_average_daily_users.subtask(args=[pks])
|
|
|
|
for pks in amo.utils.chunked(all_pks, 300)]
|
|
|
|
TaskSet(ts).apply_async()
|
2010-05-25 21:27:57 +04:00
|
|
|
|
|
|
|
All the Addons with ids in ``pks`` will (eventually) have their
|
|
|
|
``current_versions`` updated.
|
|
|
|
|
|
|
|
Cron Jobs
|
|
|
|
~~~~~~~~~
|
|
|
|
|
|
|
|
This is all good, but let's automate this. In Zamboni we can create cron
|
|
|
|
jobs like so: ::
|
|
|
|
|
|
|
|
@cronjobs.register
|
|
|
|
def update_addons_current_version():
|
|
|
|
"""Update the current_version field of the addons."""
|
|
|
|
d = Addon.objects.valid().exclude(
|
|
|
|
type=amo.ADDON_PERSONA).values_list('id')
|
|
|
|
|
|
|
|
with establish_connection() as conn:
|
|
|
|
for chunk in chunked(d, 1000):
|
|
|
|
print chunk
|
|
|
|
_update_addons_current_version.apply_async(args=[chunk],
|
|
|
|
connection=conn)
|
|
|
|
|
|
|
|
This job will hit all the addons and run the task we defined in small batches
|
|
|
|
of 1000.
|
|
|
|
|
|
|
|
We'll need to add this to both the ``prod`` and ``preview`` crontabs so that
|
|
|
|
they can be run in production.
|
|
|
|
|
|
|
|
Better than Cron
|
|
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Of course, cron is old school. We want to do better than cron, or at least not
|
|
|
|
rely on brute force tactics.
|
|
|
|
|
|
|
|
For a surgical strike, we can call ``_update_addons_current_version`` any time
|
|
|
|
we add a new version to that addon. Celery will execute it at the prescribed
|
|
|
|
rate, and your data will be updated ... eventually.
|
|
|
|
|
|
|
|
|
|
|
|
During Development
|
|
|
|
------------------
|
|
|
|
|
|
|
|
``celeryd`` only knows about code as it was defined at instantiation time. If
|
|
|
|
you change your ``@task`` function, you'll need to ``HUP`` the process.
|
|
|
|
|
|
|
|
However, if you've got the ``@task`` running perfectly you can tweak all the
|
|
|
|
code, including cron jobs that call it without need of restart.
|