2018-09-05 19:40:29 +03:00
|
|
|
# Apache Beam Jobs for Ingestion
|
2018-08-29 02:04:45 +03:00
|
|
|
|
2019-08-13 00:06:19 +03:00
|
|
|
This ingestion-beam java module contains our [Apache Beam](https://beam.apache.org/) jobs for use in Ingestion.
|
2018-09-05 19:40:29 +03:00
|
|
|
Google Cloud Dataflow is a Google Cloud Platform service that natively runs
|
|
|
|
Apache Beam jobs.
|
2018-08-29 02:04:45 +03:00
|
|
|
|
2021-04-06 23:58:49 +03:00
|
|
|
The source code lives in the [ingestion-beam](https://github.com/mozilla/gcp-ingestion/tree/main/ingestion-beam)
|
2019-08-13 00:06:19 +03:00
|
|
|
subdirectory of the gcp-ingestion repository.
|
2018-10-22 23:18:13 +03:00
|
|
|
|
2022-04-21 19:31:51 +03:00
|
|
|
The following are the main Beam classes, please see the respective sections on them in the
|
2019-09-24 22:40:45 +03:00
|
|
|
documentation:
|
|
|
|
|
2020-09-10 23:04:04 +03:00
|
|
|
- [Decoder job](./decoder-job.md): A job for normalizing ingestion messages
|
|
|
|
- [Republisher job](./republisher-job.md): A job for republishing subsets of decoded messages to new destinations
|
2019-09-24 22:40:45 +03:00
|
|
|
|
2022-04-21 19:31:51 +03:00
|
|
|
There are a few additional jobs for special cases listed in the index for this section.
|
|
|
|
|
2019-09-13 22:29:07 +03:00
|
|
|
## Building
|
|
|
|
|
|
|
|
Move to the `ingestion-beam` subdirectory of your gcp-ingestion checkout and run:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
./bin/mvn clean compile
|
|
|
|
```
|
|
|
|
|
|
|
|
See the details below under each job for details on how to run what you've produced.
|
|
|
|
|
2019-08-13 00:06:19 +03:00
|
|
|
## Testing
|
2018-08-29 02:04:45 +03:00
|
|
|
|
2019-08-09 20:19:03 +03:00
|
|
|
Before anything else, be sure to download the test data:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
./bin/download-cities15000
|
|
|
|
./bin/download-geolite2
|
|
|
|
./bin/download-schemas
|
|
|
|
```
|
|
|
|
|
2018-08-29 02:04:45 +03:00
|
|
|
Run tests locally with [CircleCI Local CLI](https://circleci.com/docs/2.0/local-cli/#installing-the-circleci-local-cli-on-macos-and-linux-distros)
|
|
|
|
|
|
|
|
```bash
|
2018-09-05 19:40:29 +03:00
|
|
|
(cd .. && circleci build --job ingestion-beam)
|
2018-08-29 02:04:45 +03:00
|
|
|
```
|
|
|
|
|
2018-09-04 18:05:11 +03:00
|
|
|
To make more targeted test invocations, you can install Java and maven locally or
|
|
|
|
use the `bin/mvn` executable to run maven in docker:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
./bin/mvn clean test
|
|
|
|
```
|
|
|
|
|
2019-08-15 19:59:21 +03:00
|
|
|
If you wish to just run a single test class or a single test case, try something like this:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
# Run all tests in a single class
|
2019-10-03 01:09:28 +03:00
|
|
|
./bin/mvn test -Dtest=com.mozilla.telemetry.util.SnakeCaseTest
|
2019-08-15 19:59:21 +03:00
|
|
|
|
|
|
|
# Run only a single test case
|
2019-10-03 01:09:28 +03:00
|
|
|
./bin/mvn test -Dtest='com.mozilla.telemetry.util.SnakeCaseTest#testSnakeCaseFormat'
|
2019-08-15 19:59:21 +03:00
|
|
|
```
|
|
|
|
|
2019-04-29 21:27:44 +03:00
|
|
|
To run the project in a sandbox against production data, see this document on
|
2019-08-13 00:06:19 +03:00
|
|
|
[configuring an integration testing workflow](./ingestion_testing_workflow.md).
|
|
|
|
|
|
|
|
## Code Formatting
|
2019-04-29 21:27:44 +03:00
|
|
|
|
2019-08-13 00:06:19 +03:00
|
|
|
Use spotless to automatically reformat code:
|
2018-08-29 02:04:45 +03:00
|
|
|
|
2019-08-13 00:06:19 +03:00
|
|
|
```bash
|
|
|
|
mvn spotless:apply
|
|
|
|
```
|
|
|
|
|
|
|
|
or just check what changes it requires:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
mvn spotless:check
|
|
|
|
```
|