Spark Streaming ETL jobs for Mozilla Telemetry
Перейти к файлу
Jeff Klukas b3318acdfe Bug 1498309 Refactor CircleCI config and use cache for assembly
See related changes in t-b-v:
https://github.com/mozilla/telemetry-batch-view/pull/487
2018-10-12 15:08:11 -04:00
.circleci Bug 1498309 Refactor CircleCI config and use cache for assembly 2018-10-12 15:08:11 -04:00
configs Add beta and aurora to devtools e2a config (#181) 2018-09-06 15:58:22 -05:00
docker Add Integration Tests with Docker 2017-09-18 14:36:48 -05:00
docs/amplitude Bug 1474987 Add "session split" meta-event 2018-08-03 15:09:41 -04:00
project Bump sbt to 1.2.1 2018-08-30 11:40:10 +02:00
src Default to a per-job streaming checkpoint location (#184) 2018-10-09 14:43:06 -05:00
.gitignore Add .gitignore 2017-01-24 17:34:02 +00:00
.jvmopts Move from TravisCI to CircleCI 2.0 2018-08-07 11:03:04 -04:00
README.md Use CircleCI README status badge instead of Travis 2018-08-08 09:06:56 -04:00
build.sbt Bug 1485583 - Add accumulator-based metrics source 2018-08-30 11:40:10 +02:00
docker_setup.sh Fix internal IP resolve on Ubuntu 2018-06-05 17:07:12 +02:00

README.md

Build Status codecov.io

telemetry-streaming

Spark Streaming ETL jobs for Mozilla Telemetry

This service currently contains jobs that aggregate error data on 5 minute intervals. It is responsible for generating the (internal only) error_aggregates and experiment_error_aggregates parquet tables at Mozilla.

Issue Tracking

Please file bugs in the Datasets: Error Aggregates component.

Amplitude Event Configuration

Some of the jobs defined in telemetry-streaming exist to transform telemetry events and republish to Amplitude for further analysis. Filtering and transforming events is accomplished via JSON configurations. If you're creating or updating such a schema, see:

Development

The recommended workflow for running tests is to use your favorite editor for editing the source code and running the tests via sbt. Some common invocations for sbt:

  • sbt test # run the basic set of tests (good enough for most purposes)
  • sbt "testOnly *ErrorAgg*" # run the tests only for packages matching ErrorAgg
  • sbt "testOnly *ErrorAgg* -- -z version" # run the tests only for packages matching ErrorAgg, limited to test cases with "version" in them
  • sbt dockerComposeTest # run the docker compose tests (slow)
  • sbt "dockerComposeTest -tags:DockerComposeTag" # run only tests with DockerComposeTag (while using docker)
  • sbt scalastyle test:scalastyle # run linter
  • sbt ci # run the full set of continuous integration tests

Some tests need Kafka to run. If one prefers to run them via IDE, it's required to run the test cluster:

sbt dockerComposeUp

or via plain docker-compose:

export DOCKER_KAFKA_HOST=$(./docker_setup.sh)
docker-compose -f docker/docker-compose.yml up

It's also good to shut down the cluster afterwards:

sbt dockerComposeStop