addons-server/docs/topics/celery.rst

======
Celery
======

`Celery <http://celeryproject.org/>`_ is a task queue powered by RabbitMQ.  You
can use it for anything that doesn't need to complete in the current
request-response cycle.  Or use it `wherever Les tells you to use it
<http://decafbad.com/blog/2008/07/04/queue-everything-and-delight-everyone>`_.

For example, each addon has a ``current_version`` cached property.  This query
on initial run causes strain on our database.  We can create a denormalized
database field called ``current_version`` on the ``addons`` table.

We'll need to populate regularly so it has fairly up-to-date data.  We can do
this in a process outside the request-response cycle.  This is where Celery
comes in.

Installation
------------

RabbitMQ
~~~~~~~~

Celery depends on RabbitMQ.  If you use ``homebrew`` you can install this:

::

  brew install rabbitmq

Setting up rabbitmq invovles some configuration.  You may want to define the
following ::

  # On a Mac, you can find this in System Preferences > Sharing
  export HOSTNAME='<laptop name>.local'

Then run the following commands: ::

  # Set your host up so it's semi-permanent
  sudo scutil --set HostName $HOSTNAME

  # Update your hosts by either:
  # 1) Manually editing /etc/hosts
  # 2) `echo 127.0.0.1 $HOSTNAME >> /etc/hosts`

  # RabbitMQ insists on writing to /var
  sudo rabbitmq-server -detached

  # Setup rabitty things (sudo is required to read the cookie file)
  sudo rabbitmqctl add_user zamboni zamboni
  sudo rabbitmqctl add_vhost zamboni
  sudo rabbitmqctl set_permissions -p zamboni zamboni ".*" ".*" ".*"

Back in safe and happy django-land you should be able to run: ::

  ./manage.py celeryd $OPTIONS

Celery understands python and any tasks that you have defined in your app are
now runnable asynchronously.

Celery Tasks
------------

Any python function can be set as a celery task.  For example, let's say we want
to update our ``current_version`` but we don't care how quickly it happens, just
that it happens.  We can define it like so: ::

  @task(rate_limit='2/m')
  def _update_addons_current_version(data, **kw):
      task_log.debug("[%s@%s] Updating addons current_versions." %
                     (len(data), _update_addons_current_version.rate_limit))
      for pk in data:
          try:
              addon = Addon.objects.get(pk=pk[0])
              addon.update_current_version()
          except Addon.DoesNotExist:
              task_log.debug("Missing addon: %d" % pk)

``@task`` is a decorator for Celery to find our tasks.  We can specify a
``rate_limit`` like ``2/m`` which means ``celeryd`` will only run this command
2 times a minute at most.  This keeps write-heavy tasks from killing your
database.

If we run this command like so: ::

    from celery.task.sets import TaskSet

    ts = [_update_addon_average_daily_users.subtask(args=[pks])
          for pks in amo.utils.chunked(all_pks, 300)]
    TaskSet(ts).apply_async()

All the Addons with ids in ``pks`` will (eventually) have their
``current_versions`` updated.

Cron Jobs
~~~~~~~~~

This is all good, but let's automate this.  In Zamboni we can create cron
jobs like so: ::

  @cronjobs.register
  def update_addons_current_version():
      """Update the current_version field of the addons."""
      d = Addon.objects.valid().exclude(
            type=amo.ADDON_PERSONA).values_list('id')

      with establish_connection() as conn:
          for chunk in chunked(d, 1000):
              print chunk
              _update_addons_current_version.apply_async(args=[chunk],
                                                         connection=conn)

This job will hit all the addons and run the task we defined in small batches
of 1000.

We'll need to add this to both the ``prod`` and ``preview`` crontabs so that
they can be run in production.

Better than Cron
~~~~~~~~~~~~~~~~
Of course, cron is old school.  We want to do better than cron, or at least not
rely on brute force tactics.

For a surgical strike, we can call ``_update_addons_current_version`` any time
we add a new version to that addon.  Celery will execute it at the prescribed
rate, and your data will be updated ... eventually.


During Development
------------------

``celeryd`` only knows about code as it was defined at instantiation time.  If
you change your ``@task`` function, you'll need to ``HUP`` the process.

However, if you've got the ``@task`` running perfectly you can tweak all the
code, including cron jobs that call it without need of restart.
Celery documentation 2010-05-25 21:27:57 +04:00			`======`
			`Celery`
			`======`

			`Celery <http://celeryproject.org/>`_ is a task queue powered by RabbitMQ. You
			`can use it for anything that doesn't need to complete in the current`
			request-response cycle. Or use it `wherever Les tells you to use it
			<http://decafbad.com/blog/2008/07/04/queue-everything-and-delight-everyone>`_.

			For example, each addon has a ``current_version`` cached property. This query
			`on initial run causes strain on our database. We can create a denormalized`
			database field called ``current_version`` on the ``addons`` table.

			`We'll need to populate regularly so it has fairly up-to-date data. We can do`
			`this in a process outside the request-response cycle. This is where Celery`
			`comes in.`

			`Installation`
			`------------`

			`RabbitMQ`
			`~~~~~~~~`

			Celery depends on RabbitMQ. If you use ``homebrew`` you can install this:

			`::`

			`brew install rabbitmq`

			`Setting up rabbitmq invovles some configuration. You may want to define the`
			`following ::`

Fixed some issues with the Celery docs 2010-11-03 23:26:19 +03:00			`# On a Mac, you can find this in System Preferences > Sharing`
			`export HOSTNAME='<laptop name>.local'`
Celery documentation 2010-05-25 21:27:57 +04:00
			`Then run the following commands: ::`

			`# Set your host up so it's semi-permanent`
			`sudo scutil --set HostName $HOSTNAME`
More celery docs updates 2010-11-17 03:53:49 +03:00
			`# Update your hosts by either:`
			`# 1) Manually editing /etc/hosts`
			# 2) `echo 127.0.0.1 $HOSTNAME >> /etc/hosts`
Celery documentation 2010-05-25 21:27:57 +04:00
			`# RabbitMQ insists on writing to /var`
			`sudo rabbitmq-server -detached`

Update celery commands to use sudo (to read the .erlang.cookie file) 2011-04-27 21:26:17 +04:00			`# Setup rabitty things (sudo is required to read the cookie file)`
			`sudo rabbitmqctl add_user zamboni zamboni`
			`sudo rabbitmqctl add_vhost zamboni`
			`sudo rabbitmqctl set_permissions -p zamboni zamboni "." "." ".*"`
Celery documentation 2010-05-25 21:27:57 +04:00
			`Back in safe and happy django-land you should be able to run: ::`

			`./manage.py celeryd $OPTIONS`

			`Celery understands python and any tasks that you have defined in your app are`
			`now runnable asynchronously.`

			`Celery Tasks`
			`------------`

			`Any python function can be set as a celery task. For example, let's say we want`
			to update our ``current_version`` but we don't care how quickly it happens, just
			`that it happens. We can define it like so: ::`

			`@task(rate_limit='2/m')`
			`def _update_addons_current_version(data, **kw):`
			`task_log.debug("[%s@%s] Updating addons current_versions." %`
			`(len(data), _update_addons_current_version.rate_limit))`
			`for pk in data:`
			`try:`
			`addon = Addon.objects.get(pk=pk[0])`
			`addon.update_current_version()`
			`except Addon.DoesNotExist:`
			`task_log.debug("Missing addon: %d" % pk)`

			``@task`` is a decorator for Celery to find our tasks. We can specify a
			``rate_limit`` like ``2/m`` which means ``celeryd`` will only run this command
			`2 times a minute at most. This keeps write-heavy tasks from killing your`
			`database.`

			`If we run this command like so: ::`

switching from establish_connection to TaskSet (bug 663555) 2011-06-13 22:24:12 +04:00			`from celery.task.sets import TaskSet`
Celery documentation 2010-05-25 21:27:57 +04:00
switching from establish_connection to TaskSet (bug 663555) 2011-06-13 22:24:12 +04:00			`ts = [_update_addon_average_daily_users.subtask(args=[pks])`
			`for pks in amo.utils.chunked(all_pks, 300)]`
			`TaskSet(ts).apply_async()`
Celery documentation 2010-05-25 21:27:57 +04:00
			All the Addons with ids in ``pks`` will (eventually) have their
			``current_versions`` updated.

			`Cron Jobs`
			`~~~~~~~~~`

			`This is all good, but let's automate this. In Zamboni we can create cron`
			`jobs like so: ::`

			`@cronjobs.register`
			`def update_addons_current_version():`
			`"""Update the current_version field of the addons."""`
			`d = Addon.objects.valid().exclude(`
			`type=amo.ADDON_PERSONA).values_list('id')`

			`with establish_connection() as conn:`
			`for chunk in chunked(d, 1000):`
			`print chunk`
			`_update_addons_current_version.apply_async(args=[chunk],`
			`connection=conn)`

			`This job will hit all the addons and run the task we defined in small batches`
			`of 1000.`

			We'll need to add this to both the ``prod`` and ``preview`` crontabs so that
			`they can be run in production.`

			`Better than Cron`
			`~~~~~~~~~~~~~~~~`
			`Of course, cron is old school. We want to do better than cron, or at least not`
			`rely on brute force tactics.`

			For a surgical strike, we can call ``_update_addons_current_version`` any time
			`we add a new version to that addon. Celery will execute it at the prescribed`
			`rate, and your data will be updated ... eventually.`


			`During Development`
			`------------------`

			``celeryd`` only knows about code as it was defined at instantiation time. If
			you change your ``@task`` function, you'll need to ``HUP`` the process.

			However, if you've got the ``@task`` running perfectly you can tweak all the
			`code, including cron jobs that call it without need of restart.`