Spark Streaming ETL jobs for Mozilla Telemetry
Перейти к файлу
Jeff Klukas 64a9ed6faa Bug 1482924 Change session split event name for backfill
As discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1482924
we want to backfill just session split events in order to pick up the
new active-ticks logic. We can only accomplish that in Amplitude
by sending events with a different name.

Once this is merged, I'll proceed with backfilling.
2018-08-15 10:47:54 -04:00
.circleci Move from TravisCI to CircleCI 2.0 2018-08-07 11:03:04 -04:00
configs Bug 1482924 Change session split event name for backfill 2018-08-15 10:47:54 -04:00
docker Add Integration Tests with Docker 2017-09-18 14:36:48 -05:00
docs/amplitude Bug 1474987 Add "session split" meta-event 2018-08-03 15:09:41 -04:00
project Extract common parts of streaming jobs to single base class 2018-06-05 17:07:12 +02:00
src Add experiment enrollments to datadog job 2018-08-13 15:46:43 -05:00
.gitignore Add .gitignore 2017-01-24 17:34:02 +00:00
.jvmopts Move from TravisCI to CircleCI 2.0 2018-08-07 11:03:04 -04:00
README.md Use CircleCI README status badge instead of Travis 2018-08-08 09:06:56 -04:00
build.sbt Move from TravisCI to CircleCI 2.0 2018-08-07 11:03:04 -04:00
docker_setup.sh Fix internal IP resolve on Ubuntu 2018-06-05 17:07:12 +02:00

README.md

Build Status codecov.io

telemetry-streaming

Spark Streaming ETL jobs for Mozilla Telemetry

This service currently contains jobs that aggregate error data on 5 minute intervals. It is responsible for generating the (internal only) error_aggregates and experiment_error_aggregates parquet tables at Mozilla.

Issue Tracking

Please file bugs in the Datasets: Error Aggregates component.

Amplitude Event Configuration

Some of the jobs defined in telemetry-streaming exist to transform telemetry events and republish to Amplitude for further analysis. Filtering and transforming events is accomplished via JSON configurations. If you're creating or updating such a schema, see:

Development

The recommended workflow for running tests is to use your favorite editor for editing the source code and running the tests via sbt. Some common invocations for sbt:

  • sbt test # run the basic set of tests (good enough for most purposes)
  • sbt "testOnly *ErrorAgg*" # run the tests only for packages matching ErrorAgg
  • sbt "testOnly *ErrorAgg* -- -z version" # run the tests only for packages matching ErrorAgg, limited to test cases with "version" in them
  • sbt dockerComposeTest # run the docker compose tests (slow)
  • sbt "dockerComposeTest -tags:DockerComposeTag" # run only tests with DockerComposeTag (while using docker)
  • sbt scalastyle test:scalastyle # run linter
  • sbt ci # run the full set of continuous integration tests

Some tests need Kafka to run. If one prefers to run them via IDE, it's required to run the test cluster:

sbt dockerComposeUp

or via plain docker-compose:

export DOCKER_KAFKA_HOST=$(./docker_setup.sh)
docker-compose -f docker/docker-compose.yml up

It's also good to shut down the cluster afterwards:

sbt dockerComposeStop