Scripts to export Play Store app data to BigQuery using Transfer Service
Перейти к файлу
Ben Wu 06a0e08ffb Add new repo location to readme 2021-02-04 15:08:45 -05:00
.circleci Add gcr push to ci 2020-06-24 12:36:07 -04:00
play_store_export Update ci config, dockerfile, makefile 2020-06-23 13:09:25 -04:00
tests Create cancel transfer script 2020-06-23 13:09:25 -04:00
.flake8 Update ci config, dockerfile, makefile 2020-06-23 13:09:25 -04:00
.gitignore Initialize repo structure 2020-06-02 17:56:15 -04:00
Dockerfile Update ci config, dockerfile, makefile 2020-06-23 13:09:25 -04:00
LICENSE Initialize repo structure 2020-06-02 17:56:15 -04:00
Makefile Update ci config, dockerfile, makefile 2020-06-23 13:09:25 -04:00
README.md Add new repo location to readme 2021-02-04 15:08:45 -05:00
requirements.dev.txt Update ci config, dockerfile, makefile 2020-06-23 13:09:25 -04:00
requirements.txt Initialize repo structure 2020-06-02 17:56:15 -04:00
setup.py Update ci config, dockerfile, makefile 2020-06-23 13:09:25 -04:00

README.md

Play Store Export

This repo has been moved to the docker-etl repo at https://github.com/mozilla/docker-etl/tree/main/jobs/play-store-export

This Play Store export is a job to schedule backfills of Play Store data to BigQuery via the BigQuery Data Transfer service.

The purpose of this job is to be scheduled to run regularly in order to continuously backfill past days over time. Past Play Store data has been found to still update over time (e.g. data from a day two weeks ago can still be updated) so regular backfills of at least 30 days are required. This is an issue with the retained installers metric in particular. The BigQuery Play Store transfer job has a non-configurable refresh window size of 7 days which is insufficient.

These scripts require that a Play Store transfer config already exists and the current gcloud user has permission to create jobs in the project.

See Google Play transfers documentation for more details.

Usage

Start a backfill using the python3 play_store_export/export.py script:

usage: export.py [-h] --date DATE --project PROJECT --transfer-config
                 TRANSFER_CONFIG [--transfer-location TRANSFER_LOCATION]
                 [--backfill-day-count BACKFILL_DAY_COUNT]

optional arguments:
  -h, --help            show this help message and exit
  --date DATE           Date at which the backfill will start, going backwards
  --project PROJECT     Either the project that the source GCS project belongs
                        to or the project that contains the transfer config
  --transfer-config TRANSFER_CONFIG
                        ID of the transfer config. This should be a UUID.
  --transfer-location TRANSFER_LOCATION
                        Region of the transfer config (defaults to `us`)
  --backfill-day-count BACKFILL_DAY_COUNT
                        Number of days to backfill

Develop

This project uses the BigQuery Data Transfer Python library: https://googleapis.dev/python/bigquerydatatransfer/latest/index.html

Install python dependencies with:

make install

Run tests with:

make test

Run linter with:

make lint

A script to cancel all running transfer jobs exists in play_store_export/cancel_transfers.py which may be useful during development and testing.