90c55fd65a
Bug 1651538 is going to relabel docker images from `build-docker-image-*` to `docker-image-*`. |
||
---|---|---|
.. | ||
imports | ||
monitor | ||
sinks | ||
transforms | ||
README.md | ||
__init__.py | ||
auto_backfill.py | ||
backfill.py | ||
backfill_repo.py | ||
buildbot_json_logs.py | ||
compact_tc_logger.py | ||
copy_index.py | ||
copy_queue.py | ||
etl.py | ||
find_es_oom.py | ||
fx_test_logger.py | ||
get_tuid.py | ||
look_at_queue.py | ||
pulse_logger.py | ||
push_to_es.py | ||
push_to_es_start.py | ||
push_to_es_stop.py | ||
s3_clear.py | ||
s3_clear_bucket.py | ||
s3_find.py | ||
s3_look.py | ||
s3_make_public.py | ||
s3_summary.py | ||
update_etl.py | ||
update_push_to_es.py |
README.md
The ETL Tasks
This ETL library performs many functions, and they are all listed in this directory. Here are some, listed in most-important-first order:
Module pulse_logger
on tc-logger
branch
A stand-alone program that stays connected to Mozilla's Pulse queue and archives the messages to S3, along with putting them on the work queue. This program is the start of the ETL pipeline, and most important because Pulse messages last only a couple of hours before they are lost (due to queue overflow).
Module etl
This contains the main routine responsible for using transforms
and applying
them against a queue of work to be done.
Module backfill
Given a set of conditions, this will review S3 and fill the work queue with items not found in ES
Module push_to_es
Responsible for adding S3 records into ES, with little or no transform.
Module update_etl
on branch etl
If the etl
or transform
code is changed, you can push those changes to the
worker machines immediately. All workers use the etl
branch, so be sure
the changes you want are there.
python.exe activedata_etl/update_etl.py --settings=resources/settings/update_etl.json