Граф коммитов

303 Коммитов

Автор SHA1 Сообщение Дата
James Graham ef61a80799 Import assigned_to field into BigQuery
The field is set to null in case the assignee matches the default.
2024-06-12 10:46:06 +01:00
jgraham 79e660e62b
Webcompat python update (#205)
* Convert tests to native pytest format

We can avoid the unittest-style wrapping and be more consistent about
assert vs assertEquals.

* Update dependencies to the latest versions.

This is required for compatibility with Python 3.12.

* Update the Docker image to Python 3.12

This is the latest stable version so gives us the longest support window.

* Switch code formatting and linting to ruff

pytest-flake8 is unmaintained, so take this oppertunity to move away from flake8+black to ruff.

* Update the CI config
2024-06-11 12:47:16 -07:00
JCMOSCON1976 1d2e10715e
Changed xm_password environment var name (#203)
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
2024-06-07 08:27:31 -04:00
m-d-bowerman a20f358754
Add funnel forecasting class for search forecasting (#175)
* Add funnel forecasting class for search forecasting

* Test case update

* Updates to write results

* Comment and setup prod tables

* Change components schema names to match forecast

* Updates for ad click forecasts

* Change data start dates for mobile forecasts

* config update

* Address comments

* Address comments

* Variable for historical indices

* Comments

* Bump Prophet

* Comments
2024-06-06 13:37:20 -07:00
JCMOSCON1976 43e6a402c1
feat:[ASP-4545] Workday - XMatters integration (#199)
* First commit

* Fixed flake8 errors

* Fixed the config.xml after running .\update_ci_config

* Changed config.yml

* Changed config.yml

* Changed pytest version in requirements.txt

* Deleted the test step from ci_job.yml

* Fixed init and secret files

* Changed local github EOL config.

* Fixed config.yml

* Changes in config.yml

* Dos2unix applied to config.yml

---------

Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
2024-06-03 09:44:40 -07:00
Ksenia b27921e5b6
Fixes #200 - Add an ability to import core bugs as kb bugs (#201) 2024-05-29 14:18:45 -07:00
Ksenia 89d998e052
Fixes #197 - Add translation step to the broken-site-report-ml job (#198) 2024-05-22 11:14:45 -07:00
Brendan Birdsong 0ac787db77
Add new job for DAP Ads PPA Dev Collector (#189) 2024-05-03 14:47:42 -05:00
Ksenia 0594478183
Fixes #187 - Add chunking to bugbug classification in broken_site_report_ml (#188) 2024-04-19 13:36:56 -04:00
Ksenia fee4f62400
Fixes #185 - Fix missing title error for broken-site-report-ml job (#186) 2024-04-17 14:52:19 -04:00
Ksenia 5b0e9d5fac
Fixes #182 - Update webcompat-kb ETL job to fetch additional bugs and history (#183) 2024-04-09 14:41:22 -07:00
Ksenia 5f5bf5ef5d
Fixes #179 - Change broken_site_report_ml ETL to use live table (#180) 2024-03-22 13:36:28 -07:00
akkomar a64b17837d
Bump gpc-gcr orb version (#174) 2024-03-04 17:55:46 +01:00
akkomar 2fd00200e1
Remove pioneer_debug job (#173) 2024-02-29 16:46:08 +01:00
akkomar 758cb2b16e
Bump Docker image version config.yml (#172) 2024-02-29 10:31:12 -05:00
m-d-bowerman 0a2b9bd7ae
Model config file setup (#171)
* Adjust MetricHub to take segments and WHERE clause

* Test with segments and WHERE clause

* Remove conditions for segment, groupby query init

* Remove leading comma from subquery strings

* Config that holds holidays for Prophet

* Config file to hold Prophet regressors

* Config file by metric setup

* Set up holder class for model results

* Remove unused config class

* Ability to select the last complete month as end date

* Fix errors in tests and last_period string check
2024-02-13 13:30:31 -08:00
Ksenia be11b68f6f
Fixes #169 - Rename the table for bugbug classification results for broken_site_report_ml ETL (#170) 2024-02-07 08:28:18 -08:00
m-d-bowerman 5463ae4c30
Adjust MetricHub to take segments and WHERE clause (#168)
* Adjust MetricHub to take segments and WHERE clause

* Test with segments and WHERE clause

* Remove conditions for segment, groupby query init

* Remove leading comma from subquery strings
2024-02-05 11:33:26 -08:00
Chelsea Troy fb5e9ed0ca
Remove search-term-data-validation (#149) 2024-01-11 15:14:35 -06:00
Ksenia cb0f21fe33
Fixes #166 - Process unclassified reports that were missed in broken_site_report_ml job (#167) 2024-01-10 10:31:35 -08:00
simon-friedberger 625d82e8c2
Bug 1867139 Update dap collector r=akkomar (#162)
Matches newer Janus version and is parallelized

Co-authored-by: akkomar <akkomar@users.noreply.github.com>
2023-12-26 21:13:20 +01:00
Ksenia 604663580f
Fixes #163 - Add a job for ML classification of broken site reports using bugbug (#164) 2023-12-14 14:16:45 -08:00
Rebecca BurWei 75c96b128a
fix: set project id (#155)
Co-authored-by: Chelsea Troy <chelseatroy@users.noreply.github.com>
2023-12-11 18:40:24 -06:00
Eduardo Filho d34c73c424
bug 1865082: Remove dev readonly access to prod psql (#161) 2023-11-27 13:28:54 -05:00
Brad Ochocki 528079a57a
Debug changes introduced in PR 156 (#158)
Debugs some changes in https://github.com/mozilla/docker-etl/pull/156 that were introduced to make `kpi-forecasting` pip installable:
- Moves the `kpi_forecasting.py` script out of the module. Python doesn't like invoking scripts that are part of a module.
- Modifies the Dockerfile to install `kpi_forecasting` as a package.
- Modifies `MetricHub.query` to be invoked like a method instead of accessed like an attribute. The change from attribute to method was made in the previous PR, but the invocation changes wasn't pushed.
- Update readme
2023-10-16 10:28:44 -07:00
Brad Ochocki dc4478dd14
Make kpi forecasting pip installable (#156)
* Small changes to make package pip-installable

* add __init__ files

* delete typo

* fully specify import names

* remove property decorator

* update readme
2023-10-13 11:04:55 -07:00
Chelsea Troy 29915dc833
Add more print statements to figure out when the job is hanging (#150)
* Add more print statements to figure out when the job is hanging

* Add yet another print
2023-09-19 19:37:26 -05:00
wil stuckey d8c58ae9b8
Bug 1852038: Add score to the bigquery schema and dataclasses (#147) 2023-09-07 10:39:55 -05:00
Brad Ochocki e3b67c46bd
Update default forecasting horizon to 18 months (#146)
This change will help the revenue team get the forecast data for the horizon that they need.
2023-09-06 15:06:34 -07:00
Chelsea Troy bcda0430c4
Replace search term validation job (#142)
* Add duplicate search term validation job

* Update documentation and dependencies

* Some kind of issue with flake8 that makes tests fail. Trying a fix

* Yeet flake

* Remove remaining flake infra

* Update CI config with './script/update_ci_config'
2023-09-05 17:47:21 -05:00
Eduardo Filho 2d7ee68986
replace mozagg load_bq bucket cfg with existing bucket (#141) 2023-08-25 11:28:11 -04:00
Chelsea Troy 5100c62b18
Add print statements to better understand where this job is broken in prod (#140) 2023-08-24 11:00:34 -05:00
Eduardo Filho 4043fcc252
Update mozaggregator-backfill bucket (#139) 2023-08-23 16:06:20 -04:00
Ksenia e11acc964d
Fixes #135 - Add a job to fetch Webcompat Knowldege Base related bugs and store them in BQ (#136)
* Fixes #135 - Add a job to fetch Webcompat Knowldegebase related bugs from bugzilla and store them in BQ

* Fixes #135 - Use a specific flake8 version
2023-08-17 12:10:07 -07:00
Alexander 9f7333e2da
Switch actions to flags (#134) 2023-07-28 14:01:23 -04:00
Alexander f4e38a1caa
Add clients-yearly backfill (#133)
* Formatting

* Added clients-yearly backfill

* Support multiple actions

* Added clients-daily-with-search

* Added end-date
2023-07-28 13:08:49 -04:00
Leif Oines 6dd3d640dd
actually make the clients_daily query run (#130) 2023-07-24 13:22:45 -04:00
Leif Oines 8e2b2805f2
modify churn pool selection and add function to do replacement on attributtable clients (#128)
* modify churn pool selection and add function that replaces data in attributable_clients_v2

* comment out attributable client replacement code for now

* add baseline clients daily and last_seen replacements

* delete old usage history function

* add sample_id
2023-07-21 09:17:03 -06:00
Chelsea Troy 5bc1b18da6
Fix syntax warning by checking for equality rather than identity on the integer zero. (#126)
This works both ways because, specifically in Python, the integers from 0 to 255are stored once and passed by reference rather than by value, but the value of having the englishy 'is not' in here is outweighed by the value of a maintainer not needing to know that. A maintainer is more likely to recognize the '!=' syntax than to know this weird thing about the Python compiler.
2023-07-12 14:02:31 -05:00
Leif Oines 58432a3ca6
fix churn pool update query (#127) 2023-07-07 15:59:11 -04:00
Alexander c716c34d2c
Script for client-regeneration simulation (#124)
* Initial commit

* Account for changes in docker build process

* Updated dependencies, addressed comments

* Modified README

* Update to actually create replacement table

---------

Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com>
2023-07-05 12:16:49 -04:00
Brad Ochocki 87bb8de82f
fix project string (#125)
String contained an underscore, but wasn't caught in local development because I wasn't writing to prod. This case would be a good candidate for unit testing.
2023-06-22 10:18:03 -07:00
Brad Ochocki 82b5c40aeb
Refactor KPI Forecasting (#121)
* wip commit

* file cleanup

* add forecast df validation

- move summary measures (mean, quantiles, etc) to config file
- improve commenting

* update config file

* simplify prophet fit method

* simplify prophet_forecast

* get all output data (proposed and legacy) formatted correctly

* add code to write model outputs to BQ

... and various other updates.

* get docker container working properly

* update readme

* update readme (again)

* make docker description more complete.

* update configs to prod values

* simplify dockerfile

* change ownership earlier

* undo some dockerfile changes

* re-add pip install to dockerfile

* revert python version change

* remove tty

* Revert "remove tty"

This reverts commit 3f0d2c689a.

* slim down Docker image and reduce pip requirements

* update README

* update table names

* update Dockerfile to have fewer layers

also update .gitignore

* Update jobs/kpi-forecasting/ci_job.yaml

Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>

* remove MetricHub._enquote

* remove unnecessary comments

* Add context and link to forecasting revamp project

* handle legacy `target` setting more robustly.

* simplify control flow logic

* test different "Test Code" command

* Update jobs/kpi-forecasting/ci_job.yaml

Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>

* updated project circleci workflow yaml and typo in README

* Update README.md

* improved error message

* update todos to include a ticket reference

---------

Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com>
Co-authored-by: kik-kik <kignasiak@mozilla.com>
2023-06-21 13:24:52 -07:00
kik-kik cdc52b62b4
feat(): updated docker version and git image used by circle ci + updated requirements (#123)
* updated docker version and git image used by circle ci + updated requirements to fix pip issues

* removing black and flake8 pytest plugins as they are outdated and using packages directly

* updated jobs to use docker and git image circle ci params

* removed trailing white spaces from README

* updated template to not use black and flake8 pytest plugins and regenerated circleci config

* updated gcp-gcr orb to the latest version

* Update requirements.in

Co-authored-by: Anna Scholtz <anna@scholtzan.net>

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
2023-06-20 16:20:33 +02:00
Alekhya 9f1363d3ec
InfluxDB to BQ ETL (#114)
* InfuxDB to BQ ETL

* fix ci issues

* Incorporate feedback
2023-05-16 16:47:54 -04:00
Brad Ochocki bffcf0af3b
cast DS to timestamp (#117)
https://github.com/mozilla/docker-etl/pull/116 was intended to fix a typing issue when uploading to BQ, but the fix did not work as expected. This fix _should_ work; I created a BQ table to reproduce the error and verify this fix.
2023-05-15 12:30:12 -06:00
Brad Ochocki aa99ea8f85
cast ds as timestamp (#116)
The Airflow task [did not succeed](https://workflow.telemetry.mozilla.org/log?dag_id=kpi_forecasting&task_id=kpi_forecasting_desktop_non_cumulative&execution_date=2023-05-06T04%3A00%3A00%2B00%3A00) after recent PRs. This appears to be due to the type of the `predictions["ds"]`; the current column type is `DATETIME`, but the BQ schema uses `TIMESTAMP`. From the Airflow logs:
```
Provided Schema does not match Table moz-fx-data-shared-prod:telemetry_derived.kpi_automated_forecast_v1. Field ds has changed type from TIMESTAMP to DATETIME
```

This type change was not intentionally made in the previous PRs, but is likely a result of updating the `prophet` package. This PR forces `predictions["ds"]` to be a `TIMESTAMP` at the time of db write.
2023-05-15 11:23:35 -06:00
Brad Ochocki d0a5a18b79
make forecast deterministic (#115)
Following [this comment](https://github.com/facebook/prophet/issues/1124#issuecomment-812904897), this PR sets a `np.random.seed` to ensure that Prophet forecasts are deterministic (and therefore repeatable). I checked that the forecasts are indeed repeatable by running locally after adding the following line to `kpi_forecasting.py` before results were written to bigquery:

```python
print(predictions.iloc[:, 1:].sum().sum() + confidences.iloc[:, 4:].sum().sum())
```

This sums all of the numeric columns in the `predictions` and `confidences` dataframes to provides a quick check that dataframes are equal across runs. This is not the most exhaustive check that could exist, but imo it's a sufficient demonstration for our use case.

```
# without setting np.random.seed
> 750505443639.0608
> 750445662883.6375
> 750454470768.1648

# after setting np.random.seed
> 750462584055.1995
> 750462584055.1995
> 750462584055.1995
```

Additional Changes:
- Reorder imports
2023-05-12 11:36:53 -05:00
Brad Ochocki 545c6f9f5e
simplify Dockerfile (#113)
* simplify Dockerfile

## Overview
This PR uses updated versions of Python and `prophet` to greatly simplify the python environment setup in the Dockerfile. The code has been tested by creating a local Docker container, and sample outputs were written to the following tables in `moz-fx-data-bq-data-science.bochocki`:
- `tmp_desktop_kpi_forecast`
- `tmp_desktop_kpi_forecast_confidences`
- `tmp_mobile_kpi_forecast`
- `tmp_mobile_kpi_forecast_confidences`

## Additional Changes
- `.gitignore`: ignore additional filetypes
- `kpi_forecasting.py`: set confidence intervals `target` from `config` instead of relying on hardcoded `"desktop"`. This `target` is overwritten in `write_confidence_intervals_to_bigquery` [here](4cfbec9153/jobs/kpi-forecasting/kpi-forecasting/Utils/DBWriter.py (L116)), but I think this change makes the it clear that we're not unintentionally using "desktop" labels on "mobile" forecasts.
- `PosteriorSampling.py`: minor refactoring required to resolve errors and deprecation warnings that are now being raised by pandas as a result of package upgrades.
- `README.md`: update examples
- `requirements.txt`: updated packages to get easier-install versions of `prophet` and `statsforecast`.

* black format

* change `MAINTAINER` label

* Revert "change `MAINTAINER` label"

This reverts commit 27229dd770.

* include pytest-black
2023-05-10 08:48:26 -05:00
Brad Ochocki 2c751a6f14
Remove Pocket references (#112)
This PR mostly cleans up the readme to not include references to QCDOU and Pocket forecasting, but also is a test to make sure that I have PR access for the repo.
2023-05-09 08:02:42 -05:00