docker-etl

Граф коммитов

Автор	SHA1	Сообщение	Дата
James Graham	ef61a80799	Import assigned_to field into BigQuery The field is set to null in case the assignee matches the default.	2024-06-12 10:46:06 +01:00
jgraham	79e660e62b	Webcompat python update (#205 ) * Convert tests to native pytest format We can avoid the unittest-style wrapping and be more consistent about assert vs assertEquals. * Update dependencies to the latest versions. This is required for compatibility with Python 3.12. * Update the Docker image to Python 3.12 This is the latest stable version so gives us the longest support window. * Switch code formatting and linting to ruff pytest-flake8 is unmaintained, so take this oppertunity to move away from flake8+black to ruff. * Update the CI config	2024-06-11 12:47:16 -07:00
JCMOSCON1976	1d2e10715e	Changed xm_password environment var name (#203 ) Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>	2024-06-07 08:27:31 -04:00
m-d-bowerman	a20f358754	Add funnel forecasting class for search forecasting (#175 ) * Add funnel forecasting class for search forecasting * Test case update * Updates to write results * Comment and setup prod tables * Change components schema names to match forecast * Updates for ad click forecasts * Change data start dates for mobile forecasts * config update * Address comments * Address comments * Variable for historical indices * Comments * Bump Prophet * Comments	2024-06-06 13:37:20 -07:00
JCMOSCON1976	43e6a402c1	feat:[ASP-4545] Workday - XMatters integration (#199 ) * First commit * Fixed flake8 errors * Fixed the config.xml after running .\update_ci_config * Changed config.yml * Changed config.yml * Changed pytest version in requirements.txt * Deleted the test step from ci_job.yml * Fixed init and secret files * Changed local github EOL config. * Fixed config.yml * Changes in config.yml * Dos2unix applied to config.yml --------- Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>	2024-06-03 09:44:40 -07:00
Ksenia	b27921e5b6	Fixes #200 - Add an ability to import core bugs as kb bugs (#201 )	2024-05-29 14:18:45 -07:00
Ksenia	89d998e052	Fixes #197 - Add translation step to the broken-site-report-ml job (#198 )	2024-05-22 11:14:45 -07:00
Brendan Birdsong	0ac787db77	Add new job for DAP Ads PPA Dev Collector (#189 )	2024-05-03 14:47:42 -05:00
Ksenia	0594478183	Fixes #187 - Add chunking to bugbug classification in broken_site_report_ml (#188 )	2024-04-19 13:36:56 -04:00
Ksenia	fee4f62400	Fixes #185 - Fix missing title error for broken-site-report-ml job (#186 )	2024-04-17 14:52:19 -04:00
Ksenia	5b0e9d5fac	Fixes #182 - Update webcompat-kb ETL job to fetch additional bugs and history (#183 )	2024-04-09 14:41:22 -07:00
Ksenia	5f5bf5ef5d	Fixes #179 - Change broken_site_report_ml ETL to use live table (#180 )	2024-03-22 13:36:28 -07:00
akkomar	a64b17837d	Bump gpc-gcr orb version (#174 )	2024-03-04 17:55:46 +01:00
akkomar	2fd00200e1	Remove pioneer_debug job (#173 )	2024-02-29 16:46:08 +01:00
akkomar	758cb2b16e	Bump Docker image version config.yml (#172 )	2024-02-29 10:31:12 -05:00
m-d-bowerman	0a2b9bd7ae	Model config file setup (#171 ) * Adjust MetricHub to take segments and WHERE clause * Test with segments and WHERE clause * Remove conditions for segment, groupby query init * Remove leading comma from subquery strings * Config that holds holidays for Prophet * Config file to hold Prophet regressors * Config file by metric setup * Set up holder class for model results * Remove unused config class * Ability to select the last complete month as end date * Fix errors in tests and last_period string check	2024-02-13 13:30:31 -08:00
Ksenia	be11b68f6f	Fixes #169 - Rename the table for bugbug classification results for broken_site_report_ml ETL (#170 )	2024-02-07 08:28:18 -08:00
m-d-bowerman	5463ae4c30	Adjust MetricHub to take segments and WHERE clause (#168 ) * Adjust MetricHub to take segments and WHERE clause * Test with segments and WHERE clause * Remove conditions for segment, groupby query init * Remove leading comma from subquery strings	2024-02-05 11:33:26 -08:00
Chelsea Troy	fb5e9ed0ca	Remove search-term-data-validation (#149 )	2024-01-11 15:14:35 -06:00
Ksenia	cb0f21fe33	Fixes #166 - Process unclassified reports that were missed in broken_site_report_ml job (#167 )	2024-01-10 10:31:35 -08:00
simon-friedberger	625d82e8c2	Bug 1867139 Update dap collector r=akkomar (#162 ) Matches newer Janus version and is parallelized Co-authored-by: akkomar <akkomar@users.noreply.github.com>	2023-12-26 21:13:20 +01:00
Ksenia	604663580f	Fixes #163 - Add a job for ML classification of broken site reports using bugbug (#164 )	2023-12-14 14:16:45 -08:00
Rebecca BurWei	75c96b128a	fix: set project id (#155 ) Co-authored-by: Chelsea Troy <chelseatroy@users.noreply.github.com>	2023-12-11 18:40:24 -06:00
Eduardo Filho	d34c73c424	bug 1865082: Remove dev readonly access to prod psql (#161 )	2023-11-27 13:28:54 -05:00
Brad Ochocki	528079a57a	Debug changes introduced in PR 156 (#158 ) Debugs some changes in https://github.com/mozilla/docker-etl/pull/156 that were introduced to make `kpi-forecasting` pip installable: - Moves the `kpi_forecasting.py` script out of the module. Python doesn't like invoking scripts that are part of a module. - Modifies the Dockerfile to install `kpi_forecasting` as a package. - Modifies `MetricHub.query` to be invoked like a method instead of accessed like an attribute. The change from attribute to method was made in the previous PR, but the invocation changes wasn't pushed. - Update readme	2023-10-16 10:28:44 -07:00
Brad Ochocki	dc4478dd14	Make kpi forecasting pip installable (#156 ) * Small changes to make package pip-installable * add __init__ files * delete typo * fully specify import names * remove property decorator * update readme	2023-10-13 11:04:55 -07:00
Chelsea Troy	29915dc833	Add more print statements to figure out when the job is hanging (#150 ) * Add more print statements to figure out when the job is hanging * Add yet another print	2023-09-19 19:37:26 -05:00
wil stuckey	d8c58ae9b8	Bug 1852038: Add score to the bigquery schema and dataclasses (#147 )	2023-09-07 10:39:55 -05:00
Brad Ochocki	e3b67c46bd	Update default forecasting horizon to 18 months (#146 ) This change will help the revenue team get the forecast data for the horizon that they need.	2023-09-06 15:06:34 -07:00
Chelsea Troy	bcda0430c4	Replace search term validation job (#142 ) * Add duplicate search term validation job * Update documentation and dependencies * Some kind of issue with flake8 that makes tests fail. Trying a fix * Yeet flake * Remove remaining flake infra * Update CI config with './script/update_ci_config'	2023-09-05 17:47:21 -05:00
Eduardo Filho	2d7ee68986	replace mozagg load_bq bucket cfg with existing bucket (#141 )	2023-08-25 11:28:11 -04:00
Chelsea Troy	5100c62b18	Add print statements to better understand where this job is broken in prod (#140 )	2023-08-24 11:00:34 -05:00
Eduardo Filho	4043fcc252	Update mozaggregator-backfill bucket (#139 )	2023-08-23 16:06:20 -04:00
Ksenia	e11acc964d	Fixes #135 - Add a job to fetch Webcompat Knowldege Base related bugs and store them in BQ (#136 ) * Fixes #135 - Add a job to fetch Webcompat Knowldegebase related bugs from bugzilla and store them in BQ * Fixes #135 - Use a specific flake8 version	2023-08-17 12:10:07 -07:00
Alexander	9f7333e2da	Switch actions to flags (#134 )	2023-07-28 14:01:23 -04:00
Alexander	f4e38a1caa	Add clients-yearly backfill (#133 ) * Formatting * Added clients-yearly backfill * Support multiple actions * Added clients-daily-with-search * Added end-date	2023-07-28 13:08:49 -04:00
Leif Oines	6dd3d640dd	actually make the clients_daily query run (#130 )	2023-07-24 13:22:45 -04:00
Leif Oines	8e2b2805f2	modify churn pool selection and add function to do replacement on attributtable clients (#128 ) * modify churn pool selection and add function that replaces data in attributable_clients_v2 * comment out attributable client replacement code for now * add baseline clients daily and last_seen replacements * delete old usage history function * add sample_id	2023-07-21 09:17:03 -06:00
Chelsea Troy	5bc1b18da6	Fix syntax warning by checking for equality rather than identity on the integer zero. (#126 ) This works both ways because, specifically in Python, the integers from 0 to 255are stored once and passed by reference rather than by value, but the value of having the englishy 'is not' in here is outweighed by the value of a maintainer not needing to know that. A maintainer is more likely to recognize the '!=' syntax than to know this weird thing about the Python compiler.	2023-07-12 14:02:31 -05:00
Leif Oines	58432a3ca6	fix churn pool update query (#127 )	2023-07-07 15:59:11 -04:00
Alexander	c716c34d2c	Script for client-regeneration simulation (#124 ) * Initial commit * Account for changes in docker build process * Updated dependencies, addressed comments * Modified README * Update to actually create replacement table --------- Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>	2023-07-05 12:16:49 -04:00
Brad Ochocki	87bb8de82f	fix project string (#125 ) String contained an underscore, but wasn't caught in local development because I wasn't writing to prod. This case would be a good candidate for unit testing.	2023-06-22 10:18:03 -07:00
Brad Ochocki	82b5c40aeb	Refactor KPI Forecasting (#121 ) * wip commit * file cleanup * add forecast df validation - move summary measures (mean, quantiles, etc) to config file - improve commenting * update config file * simplify prophet fit method * simplify prophet_forecast * get all output data (proposed and legacy) formatted correctly * add code to write model outputs to BQ ... and various other updates. * get docker container working properly * update readme * update readme (again) * make docker description more complete. * update configs to prod values * simplify dockerfile * change ownership earlier * undo some dockerfile changes * re-add pip install to dockerfile * revert python version change * remove tty * Revert "remove tty" This reverts commit `3f0d2c689a`. * slim down Docker image and reduce pip requirements * update README * update table names * update Dockerfile to have fewer layers also update .gitignore * Update jobs/kpi-forecasting/ci_job.yaml Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com> * remove MetricHub._enquote * remove unnecessary comments * Add context and link to forecasting revamp project * handle legacy `target` setting more robustly. * simplify control flow logic * test different "Test Code" command * Update jobs/kpi-forecasting/ci_job.yaml Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com> * updated project circleci workflow yaml and typo in README * Update README.md * improved error message * update todos to include a ticket reference --------- Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com> Co-authored-by: kik-kik <kignasiak@mozilla.com>	2023-06-21 13:24:52 -07:00
kik-kik	cdc52b62b4	feat(): updated docker version and git image used by circle ci + updated requirements (#123 ) * updated docker version and git image used by circle ci + updated requirements to fix pip issues * removing black and flake8 pytest plugins as they are outdated and using packages directly * updated jobs to use docker and git image circle ci params * removed trailing white spaces from README * updated template to not use black and flake8 pytest plugins and regenerated circleci config * updated gcp-gcr orb to the latest version * Update requirements.in Co-authored-by: Anna Scholtz <anna@scholtzan.net> --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net>	2023-06-20 16:20:33 +02:00
Alekhya	9f1363d3ec	InfluxDB to BQ ETL (#114 ) * InfuxDB to BQ ETL * fix ci issues * Incorporate feedback	2023-05-16 16:47:54 -04:00
Brad Ochocki	bffcf0af3b	cast DS to timestamp (#117 ) https://github.com/mozilla/docker-etl/pull/116 was intended to fix a typing issue when uploading to BQ, but the fix did not work as expected. This fix _should_ work; I created a BQ table to reproduce the error and verify this fix.	2023-05-15 12:30:12 -06:00
Brad Ochocki	aa99ea8f85	cast ds as timestamp (#116 ) The Airflow task [did not succeed](https://workflow.telemetry.mozilla.org/log?dag_id=kpi_forecasting&task_id=kpi_forecasting_desktop_non_cumulative&execution_date=2023-05-06T04%3A00%3A00%2B00%3A00) after recent PRs. This appears to be due to the type of the `predictions["ds"]`; the current column type is `DATETIME`, but the BQ schema uses `TIMESTAMP`. From the Airflow logs: ``` Provided Schema does not match Table moz-fx-data-shared-prod:telemetry_derived.kpi_automated_forecast_v1. Field ds has changed type from TIMESTAMP to DATETIME ``` This type change was not intentionally made in the previous PRs, but is likely a result of updating the `prophet` package. This PR forces `predictions["ds"]` to be a `TIMESTAMP` at the time of db write.	2023-05-15 11:23:35 -06:00
Brad Ochocki	d0a5a18b79	make forecast deterministic (#115 ) Following [this comment](https://github.com/facebook/prophet/issues/1124#issuecomment-812904897), this PR sets a `np.random.seed` to ensure that Prophet forecasts are deterministic (and therefore repeatable). I checked that the forecasts are indeed repeatable by running locally after adding the following line to `kpi_forecasting.py` before results were written to bigquery: ```python print(predictions.iloc[:, 1:].sum().sum() + confidences.iloc[:, 4:].sum().sum()) ``` This sums all of the numeric columns in the `predictions` and `confidences` dataframes to provides a quick check that dataframes are equal across runs. This is not the most exhaustive check that could exist, but imo it's a sufficient demonstration for our use case. ``` # without setting np.random.seed > 750505443639.0608 > 750445662883.6375 > 750454470768.1648 # after setting np.random.seed > 750462584055.1995 > 750462584055.1995 > 750462584055.1995 ``` Additional Changes: - Reorder imports	2023-05-12 11:36:53 -05:00
Brad Ochocki	545c6f9f5e	simplify Dockerfile (#113 ) * simplify Dockerfile ## Overview This PR uses updated versions of Python and `prophet` to greatly simplify the python environment setup in the Dockerfile. The code has been tested by creating a local Docker container, and sample outputs were written to the following tables in `moz-fx-data-bq-data-science.bochocki`: - `tmp_desktop_kpi_forecast` - `tmp_desktop_kpi_forecast_confidences` - `tmp_mobile_kpi_forecast` - `tmp_mobile_kpi_forecast_confidences` ## Additional Changes - `.gitignore`: ignore additional filetypes - `kpi_forecasting.py`: set confidence intervals `target` from `config` instead of relying on hardcoded `"desktop"`. This `target` is overwritten in `write_confidence_intervals_to_bigquery` [here](`4cfbec9153/jobs/kpi-forecasting/kpi-forecasting/Utils/DBWriter.py (L116)`), but I think this change makes the it clear that we're not unintentionally using "desktop" labels on "mobile" forecasts. - `PosteriorSampling.py`: minor refactoring required to resolve errors and deprecation warnings that are now being raised by pandas as a result of package upgrades. - `README.md`: update examples - `requirements.txt`: updated packages to get easier-install versions of `prophet` and `statsforecast`. * black format * change `MAINTAINER` label * Revert "change `MAINTAINER` label" This reverts commit `27229dd770`. * include pytest-black	2023-05-10 08:48:26 -05:00
Brad Ochocki	2c751a6f14	Remove Pocket references (#112 ) This PR mostly cleans up the readme to not include references to QCDOU and Pocket forecasting, but also is a test to make sure that I have PR access for the repo.	2023-05-09 08:02:42 -05:00

1 2 3 4 5 ...

303 Коммитов Все ветки Поиск

303 Коммитов

Все ветки