bigquery-etl

Граф коммитов

Автор	SHA1	Сообщение	Дата
Jeff Klukas	7f5079b26e	sqlfiles -> sql_files	2020-02-05 09:46:49 -05:00
Jeff Klukas	7905d7fa21	Parallelize scripts/publish_views Per request of @whd	2020-02-05 09:46:49 -05:00
Anna Scholtz	041376b487	Add init.sql for VR browser clients_daily to dryrun skip list	2020-01-24 12:33:20 -08:00
Daniel Thorn	1f343af30d	format_sql on devtools_panel_usage_v1	2020-01-24 09:12:09 -05:00
Daniel Thorn	1efbe0344a	Add script for self serve deletion (#635 )	2020-01-23 14:52:08 -08:00
Daniel Thorn	5fa7e4e61e	Correctly format scripting keywords (#693 )	2020-01-21 20:05:47 -08:00
Anna Scholtz	efcba3286d	Improvements for CRC32 stored procedure	2020-01-21 14:58:24 -08:00
Daniel Thorn	58bb0183b8	Allow generate_incremental_table to backfill days in reverse (#696 ) by specifying --start as a date after --end	2020-01-21 14:01:00 -08:00
Jeff Klukas	c092f7479c	Syntax error in publish_views for fenix	2020-01-21 16:12:45 -05:00
Jeff Klukas	4f201da964	Filter out histograms in fenix metrics ping from Glean SDK<19	2020-01-21 14:16:51 -05:00
Anna Scholtz	b31fbe3497	Metadata publish improvements and update clients_daily_v6 metadata	2020-01-17 16:03:59 -08:00
Anna Scholtz	165fe50cc8	Script for updating metadata of table	2020-01-17 16:03:59 -08:00
Anna Scholtz	47f77b7c62	Copy metadata.yaml when generating SQL	2020-01-17 16:03:59 -08:00
Jeff Klukas	0ae8143af7	Bug 1609666 Use SAFE_CAST in udf.json_extract_int_map (#681 )	2020-01-16 09:06:14 -08:00
Daniel Thorn	7c134d5617	Enforce format_sql on more files (#659 )	2020-01-10 17:07:21 -08:00
Jeff Klukas	19a4353c97	Bug: make skipping authorized views more robust This was not working in the production context where the target sql dir is under /tmp as discussed in https://github.com/mozilla/bigquery-etl/pull/655#issuecomment-572789847	2020-01-10 13:43:30 -05:00
Daniel Thorn	e871c70e09	Fail on NULL in assert_false udf (#657 ) * Fail on NULL in assert_false udf * Update tests/README.md Co-Authored-By: Anna Scholtz <anna@scholtzan.net> Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2020-01-09 16:15:42 -08:00
Daniel Thorn	2f7de8683d	Enforce script/format_sql for all new sql files (#656 )	2020-01-09 13:55:46 -08:00
Jeff Klukas	ac4a17c33f	Add list of authorized views to exempt from publishing Related to CI failures addressed in https://github.com/mozilla/bigquery-etl/issues/653	2020-01-09 11:04:50 -05:00
Jeff Klukas	a5b5da6220	Inject null for negative session_length and event timestamps As of https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/474 we will be allowing negative values for session lengths and event timestamps. We provide some safety by modifying the user-facing view for main pings to null out these negative values (since large negative values may cause significant skew in aggregations). We don't modify the user-facing view for event pings, but rather assume users will be applying the deanonymize_events UDF and handle nulling negative values there. See bugs https://bugzilla.mozilla.org/show_bug.cgi?id=1592012 and https://bugzilla.mozilla.org/show_bug.cgi?id=1602521	2020-01-09 09:28:55 -05:00
Anthony Miyaguchi	8ed78e2a18	Fix #653 - CI failing on activity_stream/tile_id_types/views.sql (#654 )	2020-01-08 15:32:12 -08:00
Jeff Klukas	78619180d0	Use labels to determine glean pings	2020-01-08 10:11:11 -05:00
Jeff Klukas	be92fd1a4e	Add parsed start and end times to views on top of Glean schemas Closes https://github.com/mozilla/gcp-ingestion/issues/633	2020-01-08 10:11:11 -05:00
Daniel Thorn	8ca73c2b60	Rewrite script/format_sql in python (#640 )	2020-01-06 16:17:41 -08:00
Frank Bertsch	de80cfd652	RFM View for LTV (#611 ) * Add new UDFs for BYTE column/day_seen Rename bitpos to align with the new convention. * Add search_rfm dataset for LTV * Move RFM calculations to UDF * Address review feedback * Fully escape UDFs * Fix _actual_ missing UDF * Don't dryrun; access denied	2019-12-19 18:01:44 -05:00
Jeff Klukas	d66f64d4ab	Bug: publish_views ignored some view.sql files Closes #600 We improve the logic for parsing view definition files, and we also now error out when we encounter a view.sql file that doesn't match our parsing rather than silently skipping.	2019-12-18 13:10:51 -05:00
Sunah Suh	ddfaf63f83	Add athena query migration script for posterity (#592 ) * Add athena query migration script for posterity * Add legacy athena migration script to pytest exclusions	2019-12-12 17:55:02 -05:00
Frank Bertsch	6c825425b3	Search clients last seen (#451 ) * Improve error message for ndjson parsing * Make JSON error messages nicer * Cast BYTES fields to/from string BYTES types are not JSON-serializable. To deal with that, we do two things: 1. Assume the input tables are hex strings, and decode them to get the BYTES fields values (on input) 2. Encode BYTES fields as hex strings (on output) This means that any data files use hex strings for BYTES fields. Note: This only works on top-level fields * Add better discrepancy reporting for test assertions When JSON blobs differ, it can be hard to tell what is wrong. These functions easily show what's different, and automatically prints them to be available when tests fail. * Add search_clients_last_seen for 1-year of history This new dataset, search_clients_last_seen, contains a year of history for each client. It is split into 3 main parts: 1. Recent info that is contained in search_clients_daily, similar to how we store that in clients_last_seen 2. A year of history, represented as a BYTES field, indicating which days they were active for different types of activity 3. Among the major search providers, arrays of totals of different metrics, split into 12 parts, to account for each months total This dataset will power LTV. * Fix linting issues * Enforce sampling on search_clients_daily * Address review feedback - Change all bits/bytes functions to include no. of bits - Use fileobj for tests - Rename some vars - Use base64 for bytes in/out * Generate sql * Add missing comma * Move search_clients_ls to search_derived * Generate moar sql * Use clients_daily_v8 * Fix query * Move tests to search_derived * Fix tests for search_clients_daily_v8 * Don't dryrun with search_clients_last_seen * Update udf/new_monthly_engine_searches_struct.sql Co-Authored-By: Jeff Klukas <jeff@klukas.net> * sample_id is now an int * Add documentation * Update schemas * Make tests use int sample-id	2019-12-12 12:43:09 -05:00
Ben Wu	50932354ce	Add script for publishing csv's as static tables (#582 )	2019-12-11 14:25:33 -05:00
Jeff Klukas	d1948fa445	Revert "Add authorized views for payload_bytes raw and error" This reverts commit `1d2fa74f9e`.	2019-12-10 14:36:14 -05:00
Jeff Klukas	b77a2b9ac1	Revert "Add all relevant views to the authorization list" This reverts commit `4f064f2e6d`.	2019-12-10 14:36:14 -05:00
Jeff Klukas	4f064f2e6d	Add all relevant views to the authorization list	2019-12-10 13:58:56 -05:00
Jeff Klukas	1d2fa74f9e	Add authorized views for payload_bytes raw and error Supports https://github.com/mozilla/bigquery-etl/issues/360	2019-12-10 13:58:56 -05:00
Ben Wu	7d9782b1ba	Bug 1543434 - Create search datasets for mobile (#559 )	2019-12-06 13:28:53 -05:00
Anthony Miyaguchi	b938356d48	Bug 1601139 - Add query to sample documents per doctype (#570 ) * Bug 1601139 - Add query to sample documents per doctype * Add docstring, fix formatting, and update column name	2019-12-04 13:46:00 -08:00
Daniel Thorn	e11d009aac	Fix case expected for XCOM_PUSH (#575 ) and add spaces to avoid issues with empty variables	2019-12-04 12:35:26 -08:00
Sunah Suh	f45591a834	Fix #550 : create airflow xcom output file in dockerfile and make writ… (#553 )	2019-12-04 14:32:08 -05:00
Daniel Thorn	c70d2e179d	Reimplement experiments_v1 as SQL (#565 ) * Reimplement experiments_v1 as SQL * Apply suggestions from code review Co-Authored-By: Sunah Suh <github@sunahsuh.com> * Update templates/telemetry_derived/experiments_v1/get_experiment_list.py * fix generate_sql	2019-12-03 15:08:57 -08:00
Jeff Klukas	0094f4ba7d	Normalize metadata in generated views on historical ping tables Merging this change will cause the changes to be used on the next deploy of the schema generation pipeline See https://github.com/mozilla-services/cloudops-infra/blob/master/projects/data-shared/Jenkinsfile.bigquery.prod#L105	2019-12-02 15:21:32 -05:00
Sunah Suh	25b702d082	Add tables to replace experiment enrollment aggregates spark streamin… (#524 ) * Add tables to replace experiment enrollment aggregates spark streaming job * Switch to python to fill in date in enrollment aggregates live view since parameters are not allowed in view defs * Direct output of arbitrary commands in entrypoint script to airflow xcom location	2019-11-27 16:55:21 -05:00
Marina Samuel	4465965f14	Code cleanup.	2019-11-25 15:30:33 -05:00
Daniel Thorn	ce28624b74	Improve format-sql for views and timestamp functions (#528 )	2019-11-25 14:15:32 -05:00
Daniel Thorn	9468f997ab	Make addons and addon_aggregates exactly replace spark versions (#532 )	2019-11-25 13:14:52 -05:00
Jeff Klukas	52e0a1acab	Move nondesktop KPI queries to stable table DAG This makes it explicit that we no longer are using imported Parquet data. It also moves several of these tables to the shared-prod project where we want them to live long-term.	2019-11-20 10:22:29 -05:00
Ben Wu	c0160496d4	Add views for search dataset (#513 )	2019-11-18 19:26:15 -05:00
Daniel Thorn	8822b522aa	Promote sql clients_daily_v6 (#501 )	2019-11-14 18:08:49 -05:00
Daniel Thorn	4b80ee2c23	Support restoring int columns in export_to_parquet (#506 )	2019-11-14 10:41:03 -05:00
Ben Wu	73dc724086	Switch search to read from flattened main summary (#500 )	2019-11-13 13:20:14 -08:00
Daniel Thorn	7b1c9d96ad	Support bigquery export to parquet via avro (#492 )	2019-11-07 13:56:33 -05:00
Daniel Thorn	9176dc940e	Unnest clients_daily (#481 )	2019-11-06 17:14:02 -05:00
Marina Samuel	384a017935	Update dryrun script.	2019-11-06 16:13:26 -05:00
Daniel Thorn	f239525363	Fix skip list for publishing udfs (#489 )	2019-11-06 15:28:34 -05:00
Daniel Thorn	fe4ec05c93	Add --dataset_id and --project_id to script/run_multipart_query (#488 )	2019-11-06 13:54:25 -05:00
Daniel Thorn	1d871edc0b	Add list of udfs to skip when publishing (#487 )	2019-11-06 13:37:47 -05:00
Daniel Thorn	dfb54323bf	Fix detection of maps when table includes dataset (#485 )	2019-11-06 12:37:56 -05:00
Daniel Thorn	eba9db159d	Add support for replacing columns in export_to_parquet (#443 )	2019-11-05 14:17:06 -05:00
Daniel Thorn	890141c140	Reimplement main_summary_v4 as SQL (#258 )	2019-11-05 13:32:25 -05:00
Anthony Miyaguchi	4c44359310	Fix #465 - Add regex strings to STRING_REGEX in format-sql (#470 )	2019-10-31 17:04:26 -07:00
Anthony Miyaguchi	084b960602	Use `#!/usr/bin/env python3` consistently (#461 )	2019-10-30 14:06:22 -07:00
Daniel Thorn	b65fbbbd93	Fix automatic formatting for nested types (#457 ) without this `<` and `>` are getting formatted as operators instead of parens	2019-10-29 22:17:06 -07:00
Daniel Thorn	cad76a9b6f	Fix CI by skipping slow-to-validate queries (#448 )	2019-10-25 09:15:20 -05:00
Sunah Suh	387536cbee	Fixes to json -> table ddl generator script (#444 )	2019-10-24 14:56:19 -07:00
Anthony Miyaguchi	d60c0fd842	Add dataset for monitoring schema errors over time (#442 ) * Add query for last month of schema errors * Add generated sql for schema error counts * Move schemas into correct location * Add document_version and named groups * Skip schema error counts in dryrun	2019-10-23 15:37:05 -07:00
Daniel Thorn	0f433f6a91	Support many billing projects and dates in copy_deduplicate (#426 ) * Support many billing projects and dates in copy_deduplicate * fix docs for --project_id * explain default --billing_projects behavior * Fix return value bug	2019-10-21 11:16:24 -07:00
Marina Samuel	a42d97af2d	Add new queries to dryrun.	2019-10-17 15:25:19 -04:00
Daniel Thorn	9bf053de74	Add --preceding-days option to copy_deduplicate (#413 )	2019-10-14 08:54:40 -07:00
Jeff Klukas	096a209ced	Fix bugs in monitoring views Also cleans up a bug in the script for publishing views to get udf_js/gunzip working, and removes accidental print statements in generate_sql.	2019-10-10 11:48:28 -04:00
Jeff Klukas	68c4d79228	Replace sql dir all at once in generate_sql I got tired of running generate_sql, then checking git status while it was running and seeing a jumble of deleted files. This PR changes the behavior to build the files in a temp dir and then copy into place only at the end.	2019-10-10 09:21:30 -04:00
Frank Bertsch	239fab252a	Pipeline sql (#388 ) * Views for monitoring structured ingestion errors * Add UDF for extracting missing columns * Add docs for json_extract_missing_cols * Fix missing WITH clause * Add generated sql * Fix test function * Update views * Update tests * Use shared-prod * Update docstring for missing cols udf * Move schemas to structured_ingestion dataset * Don't dryrun queries on payload_bytes * Change format of payload_bytes views - Use a with_ratio to reduce duplication - Change the view name to match the filename * Fix missing test function * Move to monitoring dataset * Move to new test structure * Remove spaces from js udfs * Use persistent gunzip UDF	2019-10-09 10:03:58 -04:00
Jeff Klukas	2e821fbdc1	Use the shared-prod URL for the dryrun script	2019-10-08 20:21:48 -04:00
Daniel Thorn	e872a76860	Add pytest plugins to lint python scripts (#410 ) * Add pytest plugins to lint python scripts * Fix lint errors	2019-10-08 14:00:11 -07:00
Jeff Klukas	fff2ff3275	fxa_users_services tables (#396 )	2019-10-08 13:31:01 -04:00
Daniel Thorn	8ccce8702c	Use chunksize=1 for consistent ordering in script/generate_incremental_table (#403 )	2019-10-07 10:39:47 -07:00
Ben Wu	484d5079a7	Add additional fields to search datasets (#381 )	2019-09-30 10:26:28 -04:00
Daniel Thorn	6158817ea3	Improve list_tables speed for script/copy_deduplicate (#382 )	2019-09-26 11:28:22 -07:00
Daniel Thorn	91ba7297d5	Fix incorrect parameter name in copy_deduplicate (#383 )	2019-09-25 13:30:29 -07:00
Daniel Thorn	16a1491821	Add --slices option to copy_deduplicate (#380 )	2019-09-25 11:27:36 -07:00
Daniel Thorn	6c05a96847	dont create temp tables for dry_run	2019-09-24 13:52:36 -04:00
Jeff Klukas	29f44ebbf5	Bug: datetime logic in copy_deduplicate Closes https://github.com/mozilla/bigquery-etl/issues/376 Addition to datetime.date considers only the days part of a timedelta, so we have to convert to a timestamp first.	2019-09-24 13:52:36 -04:00
Daniel Thorn	54ae019ff3	Set temp table expiration in copy_deduplicate (#374 )	2019-09-23 13:03:19 -07:00
Daniel Thorn	4f48ae21ef	Add --hourly option to copy_deduplicate (#370 )	2019-09-23 10:54:21 -07:00
Jeff Klukas	f0f5ec99a6	Import events from FxA oauth server I've already created the new table and backfilled existing dates. Addresses #348	2019-09-19 14:01:58 -04:00
Daniel Thorn	469c03ec10	Add script to format sql (#173 )	2019-09-18 17:48:53 -07:00
Jeff Klukas	0ea63c7775	dryrun script updates	2019-09-17 15:30:59 -04:00
Jeff Klukas	f4c5ea8e7c	Run black	2019-09-13 10:00:33 -04:00
Sunah Suh	f9c611a906	Fix UDF publisher script (#330 )	2019-09-04 10:54:48 -05:00
Jeff Klukas	71ad6652f5	Update publish_views script for new directory structure	2019-08-28 20:58:43 -04:00
Jeff Klukas	c2269a69af	Update generate_view script for new directory structure Closes #317	2019-08-28 20:58:43 -04:00
Sunah Suh	030ca5872a	Add script to recreate table creation DDL SQL from json description (#309 ) Add script to recreate table creation DDL SQL from json description	2019-08-28 11:49:42 -05:00
Daniel Thorn	99fe0dfd9e	Move queries into destination-table directories (#286 ) * Move queries into destination-table directories * Apply suggestions from code review Co-Authored-By: Jeff Klukas <jeff@klukas.net>	2019-08-26 12:52:49 -07:00
Anna Scholtz	9580029e20	UDF for unzipping gzipped bytes (#272 ) * UDF for decompressing gzip data * Update script for publishing UDFs to upload UDF dependency files * Address review feedback for gunzip UDF * Set default GCS bucket to moz-fx-data-prod-bigquery-etl * Add function to upload UDF dependencies to GCS * Set data-eng-circleci-tests context in CircleCI config * Add approval step in CircleCI config	2019-08-26 10:53:06 -07:00
Jeff Klukas	5b005570db	Make copy_deduplicate query more efficient for large tables Closes #307	2019-08-23 14:43:17 -04:00
Anna Scholtz	7a6f7aacf8	View generation fixes	2019-08-21 15:03:50 -07:00
Anna Scholtz	52061238f5	Add wrapper script for generating and publishing views	2019-08-21 15:03:50 -07:00
Anna Scholtz	1d87797bd6	Fix rebase conflicts	2019-08-21 15:03:50 -07:00
Anna Scholtz	7520de5092	Script for auto-generating views	2019-08-21 15:03:50 -07:00
Jeff Klukas	7e533b4b24	Apply suggestions from code review Co-Authored-By: Daniel Thorn <dthorn@mozilla.com>	2019-08-20 16:47:33 -04:00
Jeff Klukas	a407f50c14	Explain the case of ignoring tables with trailing underscore	2019-08-20 16:47:33 -04:00
Jeff Klukas	5c06876590	Support altering target project in publish_views script	2019-08-20 16:47:33 -04:00
Jeff Klukas	12215f88fa	Generate latest-version views for derived tables	2019-08-20 16:47:33 -04:00
Daniel Thorn	91e1be5394	Use mode last in clients_daily_v7 (#86 )	2019-08-14 14:49:38 -07:00
Jeff Klukas	55cdf93c5f	instantiate client after parsing args	2019-08-07 13:18:31 -04:00
Jeff Klukas	a929285ca0	Apply suggestions from code review Co-Authored-By: Daniel Thorn <dthorn@mozilla.com>	2019-08-07 13:18:31 -04:00
Jeff Klukas	46232da366	Refactor dryrun script for concurrency and permissions	2019-08-07 13:18:31 -04:00
Jeff Klukas	fbc1b9f564	Fix up imports	2019-08-07 13:18:31 -04:00
Jeff Klukas	937d294d65	Add publish_views script	2019-08-07 13:18:31 -04:00
Jeff Klukas	f2ecc96ce8	Adapt view update script for generating definitions instead	2019-08-07 13:18:31 -04:00
Daniel Thorn	28ce0d1f11	Add script for updating latest-version views	2019-08-07 13:18:31 -04:00
Daniel Thorn	22520e31f6	Use prepend_udf_usage_definitions in generate_sql (#287 )	2019-08-05 16:10:31 -07:00
Daniel Thorn	e1bf990b9a	Add support for testing queries with persistent UDFs (#285 )	2019-08-05 14:14:19 -07:00
Daniel Thorn	a241017c15	Reuse bigquery client and set default project_id (#282 )	2019-08-02 13:36:10 -07:00
Daniel Thorn	5308e79570	detect errors when publishing udfs (#281 )	2019-08-02 13:16:43 -07:00
Jeff Klukas	15640b831f	Support options with underscores and fix incorrect variable	2019-08-01 14:00:57 -04:00
Jeff Klukas	0bc42132a2	Add --parallelism option	2019-08-01 10:15:33 -04:00
Jeff Klukas	4242d95777	Remove new entrypoint clause	2019-08-01 10:15:33 -04:00
Jeff Klukas	00cef9d7e9	Run black and refactor --only and --except args	2019-08-01 10:15:33 -04:00
Jeff Klukas	ccb65d6d18	Apply suggestions from code review Co-Authored-By: Daniel Thorn <dthorn@mozilla.com>	2019-08-01 10:15:33 -04:00
Jeff Klukas	c9f65a7af8	Refactor to allow jobs to run in a different project I believe Airflow may need to issue the jobs from derived-datasets for the time being, so we make sure to fully qualify all table references with the project_id that's passed as a parameter.	2019-08-01 10:15:33 -04:00
Jeff Klukas	0a3d4ea59d	Fix typo Co-Authored-By: Anna Scholtz <anna@scholtzan.net>	2019-08-01 10:15:33 -04:00
Jeff Klukas	a275cfb9e5	Add copy_deduplicate script Closes #220 A PR to add schedule this script in Airflow to follow.	2019-08-01 10:15:33 -04:00
Allen Short	351b42e84a	Dry-run each query in CircleCI against prod datasets (#261 ) * Dry-run each query in CircleCI against prod datasets * Apply suggestions from code review * Update script/dryrun	2019-07-30 10:26:25 -07:00
Daniel Thorn	f79d075448	Add dataset names to paths in sql/ (#265 ) * Add dataset names to paths in sql/ * rename clients_last_seen_raw_v1 to clients_last_seen_v1 * rename telemetry_raw to telemetry_derived * address review	2019-07-30 09:39:22 -07:00
Jeff Klukas	01cb6e1074	Refactor naming of UDFs	2019-07-24 09:01:13 -04:00
Jeff Klukas	680c26ac41	Efficiency tweak: avoid double-publishing UDFs	2019-07-24 09:01:13 -04:00
Anna Scholtz	ff356466f6	Bugfix: missing UDFs without dependencies	2019-07-22 13:44:28 -07:00
Anna Scholtz	4f897edd8a	Add project ID when creating UDFs	2019-07-22 13:44:28 -07:00
Anna Scholtz	c3d06f94d2	Script to publish persistent UDFs	2019-07-22 13:44:28 -07:00
Daniel Thorn	d6e35295ec	Fix help page for script/generate_incremental_table (#244 )	2019-07-22 12:58:35 -07:00
Anna Scholtz	7207a4e52f	Move SQL templates to templates/ and add generated SQL	2019-06-25 08:07:26 -07:00
Anna Scholtz	fe7325dcb4	Run SQL generation script in when creating docker image	2019-06-25 08:07:26 -07:00
Anna Scholtz	aa637154c5	Ensure that UDFs are added only once and in order when generating SQL files	2019-06-25 08:07:26 -07:00
Anna Scholtz	a6661c5896	Trigger SQL query generation in pytest and update CircleCI config	2019-06-25 08:07:26 -07:00
Anna Scholtz	b62970f3a9	Makefile for generating sql and add newline breaks to new files	2019-06-25 08:07:26 -07:00
Anna Scholtz	f2efcc0432	Adopt CircleCI script to generate SQL queries	2019-06-25 08:07:26 -07:00
Anna Scholtz	fd21ba88c2	Add Python script to generate SQL files with UDF declarations	2019-06-25 08:07:26 -07:00
Jeff Klukas	5eb134ca86	fixups found while running the deletions	2019-05-23 16:42:08 -04:00
Jeff Klukas	120153dabe	respond to review comments	2019-05-23 16:42:08 -04:00
Jeff Klukas	845fa792c3	Codify archiving of exact mau table	2019-05-23 16:42:08 -04:00
Jeff Klukas	76a8a23e54	Use generate_incremental_table for clients_last_seen backfill	2019-05-23 16:42:08 -04:00
Jeff Klukas	d9669e325a	Add comments on tables that do not exist in BQ	2019-05-23 16:42:08 -04:00
Jeff Klukas	d481d93861	delete from experiments and search_clients_daily datasets	2019-05-23 16:42:08 -04:00
Jeff Klukas	090fce87cb	Better handling for bq tables	2019-05-23 16:42:08 -04:00
Jeff Klukas	b421eeeb11	Correct time range for gs	2019-05-23 16:42:08 -04:00
Jeff Klukas	9970aebd4e	Add delete-from-bq.sh	2019-05-23 16:42:08 -04:00
Jeff Klukas	7c8cbbc0f4	Bug 1550814 Remove data collected during hotfix rollout See https://bugzilla.mozilla.org/show_bug.cgi?id=1550814	2019-05-23 16:42:08 -04:00
Jeff Klukas	e782c5f6ff	Add --destination-table and selectExprs options to export_to_parquet We implemented a view-based solution for creating clients_last_seen from clients_last_seen_raw in Athena and Presto, but this made the table unavailable from Spark. By adding in these options, we can materialize view logic at the time of writing to Parquet, so that it will be available to all Parquet consumers.	2019-05-22 14:01:40 -04:00
Jeff Klukas	e0dcbdaf36	Usability improvements for generate_incremental_table	2019-05-16 09:41:14 -04:00
Daniel Thorn	1dad4e14b4	Rewrite generate_incremental_table in python (#126 ) fixes #115	2019-05-14 09:22:31 -07:00
Daniel Thorn	c2e416cefd	Create ~/.bigqueryrc without GCLOUD_SERVICE_KEY (#112 )	2019-05-01 13:38:31 -07:00
Daniel Thorn	606fec9c04	Set sane defaults for bq use in airflow (#110 )	2019-05-01 08:24:57 -07:00
Daniel Thorn	e38d3d6f14	Fix --submission-date for export_to_parquet.py (#105 )	2019-04-25 13:32:52 -07:00
Daniel Thorn	8abd41397d	Add script to automate generating incremental tables (#103 )	2019-04-25 09:10:29 -07:00
Daniel Thorn	52d10541d3	Create and publish docker image (#95 ) * Create and publish docker image * Update config.yml	2019-04-19 09:19:36 -07:00
Daniel Thorn	099ff31aae	Add pyspark script for exporting to parquet (#89 ) * Add pyspark script for exporting to parquet * address review	2019-04-17 12:07:41 -07:00

... 5 6 7 8 9 ...

454 Коммитов