Update ci config, dockerfile, makefile

This commit is contained in:
Benjamin Wu 2020-06-10 11:17:43 -04:00 коммит произвёл Ben Wu
Родитель f66f13ca12
Коммит 3747e0b5d9
9 изменённых файлов: 125 добавлений и 26 удалений

Просмотреть файл

@ -1,5 +1,16 @@
version: 2.1
commands:
restore-docker:
description: Restore Docker image cache
steps:
- setup_remote_docker
- restore_cache:
key: v1-{{.Branch}}
- run:
name: Restore Docker image cache
command: docker load -i /cache/docker.tar
jobs:
build:
docker:
@ -19,18 +30,21 @@ jobs:
- /cache/docker.tar
test:
docker:
docker: &docker
- image: docker:18.06.0-ce
steps:
- setup_remote_docker
- restore_cache:
key: v1-{{.Branch}}
- run:
name: Restore Docker image cache
command: docker load -i /cache/docker.tar
- restore-docker
- run:
name: Test Code
command: docker run app:build test
command: docker run app:build make test
lint:
docker: *docker
steps:
- restore-docker
- run:
name: Lint Code
command: docker run app:build make lint
workflows:
main:
@ -40,3 +54,7 @@ workflows:
- test:
requires:
- build
- lint:
requires:
- build

3
.flake8 Normal file
Просмотреть файл

@ -0,0 +1,3 @@
[flake8]
max-line-length = 100
exclude = venv/*

Просмотреть файл

@ -8,10 +8,14 @@ ARG HOME="/app"
ENV HOME=${HOME}
RUN groupadd --gid ${USER_ID} ${GROUP_ID} && \
useradd --create-home --uid ${USER_ID} --gid ${GROUP_ID} --home-dir /app ${GROUP_ID}
useradd --create-home --uid ${USER_ID} --gid ${GROUP_ID} --home-dir ${HOME} ${GROUP_ID}
RUN pip install --upgrade pip
COPY requirements.txt ./
COPY requirements.dev.txt ./
RUN pip install -r requirements.dev.txt
WORKDIR ${HOME}
COPY requirements.txt ./
@ -20,8 +24,8 @@ RUN pip install -r requirements.dev.txt
COPY . .
RUN pip install .
# Drop root and change ownership of the application folder to the user
RUN chown -R ${USER_ID}:${GROUP_ID} ${HOME}
USER ${USER_ID}
ENTRYPOINT ["/app/entrypoint"]

11
Makefile Normal file
Просмотреть файл

@ -0,0 +1,11 @@
.PHONY: install lint test
install:
pip install -r requirements.dev.txt
pip install .
lint:
flake8
test:
pytest tests/

Просмотреть файл

@ -2,13 +2,58 @@
This Play Store export is a job to schedule backfills of Play Store data to BigQuery via the BigQuery Data Transfer service.
The purpose of this job is to continuously backfill past days over time.
Past Play Store data has been found to still update over time
(e.g. data from a day two weeks ago can still be updated)
The purpose of this job is to be scheduled to run regularly in order to continuously backfill past days over time.
Past Play Store data has been found to still update over time (e.g. data from a day two weeks ago can still be updated)
so regular backfills of at least 30 days are required.
The BigQuery Play Store transfer job has a non-configurable refresh
window size of 7 days which is insufficient.
This is an issue with the retained installers metric in particular.
The BigQuery Play Store transfer job has a non-configurable refresh window size of 7 days which is insufficient.
These scripts require that a Play Store transfer config already exists and the current gcloud user has
permission to create jobs in the project.
See [Google Play transfers documentation](https://cloud.google.com/bigquery-transfer/docs/play-transfer) for more details.
## Usage
Start a backfill using the `python3 play_store_export/export.py` script:
```sh
usage: export.py [-h] --date DATE --project PROJECT --transfer-config
TRANSFER_CONFIG [--transfer-location TRANSFER_LOCATION]
[--backfill-day-count BACKFILL_DAY_COUNT]
optional arguments:
-h, --help show this help message and exit
--date DATE Date at which the backfill will start, going backwards
--project PROJECT Either the project that the source GCS project belongs
to or the project that contains the transfer config
--transfer-config TRANSFER_CONFIG
ID of the transfer config. This should be a UUID.
--transfer-location TRANSFER_LOCATION
Region of the transfer config (defaults to `us`)
--backfill-day-count BACKFILL_DAY_COUNT
Number of days to backfill
```
## Develop
This project uses the BigQuery Data Transfer Python library:
https://googleapis.dev/python/bigquerydatatransfer/latest/index.html
Install python dependencies with:
```sh
make install
```
Run tests with:
```sh
make test
```
Run linter with:
```sh
make lint
```
A script to cancel all running transfer jobs exists in `play_store_export/cancel_transfers.py`
which may be useful during development and testing.

Просмотреть файл

@ -1,9 +0,0 @@
#!/bin/bash
set -e
if [ "$1" = test ]; then
exec pytest "${@:2}"
else
exec "$@"
fi

Просмотреть файл

@ -103,7 +103,7 @@ def wait_for_transfer(transfer_name: str, timeout: int = 1200, polling_period: i
def start_export(project: str, transfer_config_name: str, transfer_location: str,
base_date: datetime.date, backfill_day_count: int = 35):
base_date: datetime.date, backfill_day_count: int):
"""
Start and wait for the completion of a backfill of `backfill_day_count` days, counting
backwards from `base_date. The base date is included in the backfill and counts as a

Просмотреть файл

@ -1,2 +1,3 @@
flake8==3.8.2
pytest==5.4.3
-r requirements.txt

26
setup.py Normal file
Просмотреть файл

@ -0,0 +1,26 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
from setuptools import setup, find_packages
readme = open("README.md").read()
setup(
name="play_store_export",
description="Scripts to export Play Store app data to BigQuery using Transfer Service",
author="Ben Wu",
author_email="bewu@mozilla.com",
url="https://github.com/mozilla/leanplum_data_export",
packages=find_packages(include=["play_store_export"]),
package_dir={"play-store-export": "play_store_export"},
python_requires=">=3.6.0",
version="0.1.0",
long_description=readme,
include_package_data=True,
license="Mozilla",
)