LookML Generator for Glean and Mozilla Data
Перейти к файлу
dependabot[bot] eb06c5f368
Bump google-cloud-bigquery from 3.1.0 to 3.2.0 (#460)
Bumps [google-cloud-bigquery](https://github.com/googleapis/python-bigquery) from 3.1.0 to 3.2.0.
- [Release notes](https://github.com/googleapis/python-bigquery/releases)
- [Changelog](https://github.com/googleapis/python-bigquery/blob/main/CHANGELOG.md)
- [Commits](https://github.com/googleapis/python-bigquery/compare/v3.1.0...v3.2.0)

---
updated-dependencies:
- dependency-name: google-cloud-bigquery
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-09 08:43:08 -07:00
.circleci Ignore comments when verifying requirements.txt (#438) 2022-05-11 12:13:05 -07:00
.github Tag group for review on dependabot PRs 2021-05-18 13:59:01 -04:00
architecture Add notes about multiple view per file convention (#159) 2021-06-08 22:42:40 -04:00
bin Remove generate_looker_content call (#425) 2022-04-25 15:30:12 -07:00
generator add wildcard to rally- and pioneer- (#454) 2022-06-06 10:56:46 -07:00
tests Test non-wildcard disallows (#458) 2022-06-06 20:14:24 +02:00
.dockerignore feat: Dockerfile tweaks (#434) 2022-05-11 09:23:17 -07:00
.flake8 Add links to generated measures 2021-05-05 13:33:50 -04:00
.gitignore Hackweek: Adding mozilla www metrics and summeries to Looker (#437) 2022-05-12 09:04:42 -04:00
.isort.cfg Generate namespaces.yaml (#9) 2021-03-08 13:45:24 -08:00
.pre-commit-config.yaml Add support for time partitioning fields other than submission_date (#450) 2022-05-25 14:58:06 -07:00
.yamllint.yaml Add fxa namespace 2021-06-25 09:38:28 -04:00
Dockerfile Fix ssh-keygen missing in docker (#440) 2022-05-11 14:01:41 -07:00
LICENSE Initial commit 2021-02-08 11:21:24 -05:00
Makefile WIP: Script and infra for LookML generation 2021-04-19 13:50:45 -04:00
README.md Use --no-deps when installing compiled requirements files (#370) 2022-02-24 11:44:07 -08:00
custom-namespaces.yaml Added acoustic_contacts_current_snapshot view (+ rename) (#456) 2022-06-03 15:58:54 +02:00
docker-compose.yml Add UPDATE_SPOKE_BRANCHES option to generate 2021-09-22 11:28:00 -07:00
namespaces-disallowlist.yaml add wildcard to rally- and pioneer- (#454) 2022-06-06 10:56:46 -07:00
netlify.toml Add Operational Monitoring Views and Explores 2021-10-05 12:16:00 -04:00
pytest.ini Generate namespaces.yaml (#9) 2021-03-08 13:45:24 -08:00
requirements.in Bump google-cloud-bigquery from 3.1.0 to 3.2.0 (#460) 2022-06-09 08:43:08 -07:00
requirements.txt Bump google-cloud-bigquery from 3.1.0 to 3.2.0 (#460) 2022-06-09 08:43:08 -07:00
setup.py Finalize container and script to publish LookML 2021-04-19 13:50:45 -04:00

README.md

lookml-generator

mozilla

Under Active Development

LookML Generator for Glean and Mozilla Data.

The lookml-generator has two important roles:

  1. Generate a listing of all Glean/Mozilla namespaces and their associated BigQuery tables
  2. From that listing, generate LookML for views, explores, and dashboards and push those to the Look Hub project

Generating Namespace Listings

At Mozilla, a namespace is a single functional area that is represented in Looker with (usually) one model*. Each Glean application is self-contained within a single namespace, containing the data from across that application's channels. We also support custom namespaces, which can use wildcards to denote their BigQuery datasets and tables. These are described in custom-namespaces.yaml.

alt text

* Though namespaces are not limited to a single model, we advise it for clarity's sake.

Adding Custom Namespaces

Custom namespaces need to be defined explicitly in custom-namespaces.yaml. For each namespace views and explores to be generated need to be specified.

Make sure the custom namespaces is not listed in namespaces-disallowlist.yaml.

Once changes have been approved and merged, the lookml-generator changes can get deployed.

Generating LookML

Once we know which tables are associated with which namespaces, we can generate LookML files and update our Looker instance.

Lookml-generator generates LookML based on both the BigQuery schema and manual changes. For example, we would want to add city drill-downs for all country fields. alt text

Pushing Changes to Dev Branches

In addition to pushing new lookml to the main branch, we reset the dev branches to also point to the commit at main. This only happens during production deployment runs.

To automate this process for your dev branch, add it to this file. You can edit that file in your browser. Open a PR and tag data-looker for review. You can find your dev branch by going to Looker, entering development mode, opening the looker-hub project, clicking the "Git Actions" icon, and finding your personal branch in the "Current Branch" dropdown.

Setup

Ensure Python 3.8+ is available on your machine (see this guide for instructions if you're on a mac and haven't installed anything other than the default system Python.)

You will also need the Google Cloud SDK with valid credentials. After setting up the Google Cloud SDK, run:

gcloud config set project moz-fx-data-shared-prod
gcloud auth login --update-adc

Install requirements in a Python venv

python3.8 -m venv venv/
venv/bin/pip install --no-deps -r requirements.txt

Update requirements when they change with pip-sync

venv/bin/pip-sync

Setup pre-commit hooks

venv/bin/pre-commit install

Run unit tests and linters

venv/bin/pytest

Run integration tests

venv/bin/pytest -m integration

Note that the integration tests require a valid login to BigQuery to succeed.

Testing generation locally

You can test namespace generation by running:

./bin/generator namespaces

To generate the actual lookml (in looker-hub), run:

./bin/generator lookml

Container Development

Most code changes will not require changes to the generation script or container. However, you can test it locally. The following script will test generation, pushing a new branch to the looker-hub repository:

export HUB_BRANCH_PUBLISH="yourname-generation-test-1"
export GIT_SSH_KEY_BASE64=$(cat ~/.ssh/id_rsa | base64)
make build && make run

Deploying new lookml-generator changes

lookml-generator runs daily to update the looker-hub and looker-spoke-default code. Changes to the underlying tables should automatically propogate to their respective views and explores.

However, changes to lookml-generator need to be tested on stage and deployed. The general process is the following:

  1. Create a PR, test on dev. It is not necessary to add Looker credentials, but the container changes should run using make build && make run, with changes reflected in LookML repos.
  2. Once merged, the changes should run on stage. They will run automatically after schema deploys, but they can be run manually by clearing the lookml_generator_staging task in Airflow.
  3. Once the changes are confirmed in stage, we first tag a new release here. Add a description with what the new release includes. Finally, change the Airflow variable lookml_generator_release_str to the version string you created when cutting the release. Re-run the DAG and the changes should take effect.