* Convert tests to native pytest format
We can avoid the unittest-style wrapping and be more consistent about
assert vs assertEquals.
* Update dependencies to the latest versions.
This is required for compatibility with Python 3.12.
* Update the Docker image to Python 3.12
This is the latest stable version so gives us the longest support window.
* Switch code formatting and linting to ruff
pytest-flake8 is unmaintained, so take this oppertunity to move away from flake8+black to ruff.
* Update the CI config
* Add funnel forecasting class for search forecasting
* Test case update
* Updates to write results
* Comment and setup prod tables
* Change components schema names to match forecast
* Updates for ad click forecasts
* Change data start dates for mobile forecasts
* config update
* Address comments
* Address comments
* Variable for historical indices
* Comments
* Bump Prophet
* Comments
* First commit
* Fixed flake8 errors
* Fixed the config.xml after running .\update_ci_config
* Changed config.yml
* Changed config.yml
* Changed pytest version in requirements.txt
* Deleted the test step from ci_job.yml
* Fixed init and secret files
* Changed local github EOL config.
* Fixed config.yml
* Changes in config.yml
* Dos2unix applied to config.yml
---------
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
* Adjust MetricHub to take segments and WHERE clause
* Test with segments and WHERE clause
* Remove conditions for segment, groupby query init
* Remove leading comma from subquery strings
* Config that holds holidays for Prophet
* Config file to hold Prophet regressors
* Config file by metric setup
* Set up holder class for model results
* Remove unused config class
* Ability to select the last complete month as end date
* Fix errors in tests and last_period string check
* Adjust MetricHub to take segments and WHERE clause
* Test with segments and WHERE clause
* Remove conditions for segment, groupby query init
* Remove leading comma from subquery strings
Debugs some changes in https://github.com/mozilla/docker-etl/pull/156 that were introduced to make `kpi-forecasting` pip installable:
- Moves the `kpi_forecasting.py` script out of the module. Python doesn't like invoking scripts that are part of a module.
- Modifies the Dockerfile to install `kpi_forecasting` as a package.
- Modifies `MetricHub.query` to be invoked like a method instead of accessed like an attribute. The change from attribute to method was made in the previous PR, but the invocation changes wasn't pushed.
- Update readme
* Add duplicate search term validation job
* Update documentation and dependencies
* Some kind of issue with flake8 that makes tests fail. Trying a fix
* Yeet flake
* Remove remaining flake infra
* Update CI config with './script/update_ci_config'
* modify churn pool selection and add function that replaces data in attributable_clients_v2
* comment out attributable client replacement code for now
* add baseline clients daily and last_seen replacements
* delete old usage history function
* add sample_id
This works both ways because, specifically in Python, the integers from 0 to 255are stored once and passed by reference rather than by value, but the value of having the englishy 'is not' in here is outweighed by the value of a maintainer not needing to know that. A maintainer is more likely to recognize the '!=' syntax than to know this weird thing about the Python compiler.
String contained an underscore, but wasn't caught in local development because I wasn't writing to prod. This case would be a good candidate for unit testing.
* updated docker version and git image used by circle ci + updated requirements to fix pip issues
* removing black and flake8 pytest plugins as they are outdated and using packages directly
* updated jobs to use docker and git image circle ci params
* removed trailing white spaces from README
* updated template to not use black and flake8 pytest plugins and regenerated circleci config
* updated gcp-gcr orb to the latest version
* Update requirements.in
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
---------
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
https://github.com/mozilla/docker-etl/pull/116 was intended to fix a typing issue when uploading to BQ, but the fix did not work as expected. This fix _should_ work; I created a BQ table to reproduce the error and verify this fix.
The Airflow task [did not succeed](https://workflow.telemetry.mozilla.org/log?dag_id=kpi_forecasting&task_id=kpi_forecasting_desktop_non_cumulative&execution_date=2023-05-06T04%3A00%3A00%2B00%3A00) after recent PRs. This appears to be due to the type of the `predictions["ds"]`; the current column type is `DATETIME`, but the BQ schema uses `TIMESTAMP`. From the Airflow logs:
```
Provided Schema does not match Table moz-fx-data-shared-prod:telemetry_derived.kpi_automated_forecast_v1. Field ds has changed type from TIMESTAMP to DATETIME
```
This type change was not intentionally made in the previous PRs, but is likely a result of updating the `prophet` package. This PR forces `predictions["ds"]` to be a `TIMESTAMP` at the time of db write.
Following [this comment](https://github.com/facebook/prophet/issues/1124#issuecomment-812904897), this PR sets a `np.random.seed` to ensure that Prophet forecasts are deterministic (and therefore repeatable). I checked that the forecasts are indeed repeatable by running locally after adding the following line to `kpi_forecasting.py` before results were written to bigquery:
```python
print(predictions.iloc[:, 1:].sum().sum() + confidences.iloc[:, 4:].sum().sum())
```
This sums all of the numeric columns in the `predictions` and `confidences` dataframes to provides a quick check that dataframes are equal across runs. This is not the most exhaustive check that could exist, but imo it's a sufficient demonstration for our use case.
```
# without setting np.random.seed
> 750505443639.0608
> 750445662883.6375
> 750454470768.1648
# after setting np.random.seed
> 750462584055.1995
> 750462584055.1995
> 750462584055.1995
```
Additional Changes:
- Reorder imports
* simplify Dockerfile
## Overview
This PR uses updated versions of Python and `prophet` to greatly simplify the python environment setup in the Dockerfile. The code has been tested by creating a local Docker container, and sample outputs were written to the following tables in `moz-fx-data-bq-data-science.bochocki`:
- `tmp_desktop_kpi_forecast`
- `tmp_desktop_kpi_forecast_confidences`
- `tmp_mobile_kpi_forecast`
- `tmp_mobile_kpi_forecast_confidences`
## Additional Changes
- `.gitignore`: ignore additional filetypes
- `kpi_forecasting.py`: set confidence intervals `target` from `config` instead of relying on hardcoded `"desktop"`. This `target` is overwritten in `write_confidence_intervals_to_bigquery` [here](4cfbec9153/jobs/kpi-forecasting/kpi-forecasting/Utils/DBWriter.py (L116)), but I think this change makes the it clear that we're not unintentionally using "desktop" labels on "mobile" forecasts.
- `PosteriorSampling.py`: minor refactoring required to resolve errors and deprecation warnings that are now being raised by pandas as a result of package upgrades.
- `README.md`: update examples
- `requirements.txt`: updated packages to get easier-install versions of `prophet` and `statsforecast`.
* black format
* change `MAINTAINER` label
* Revert "change `MAINTAINER` label"
This reverts commit 27229dd770.
* include pytest-black
This PR mostly cleans up the readme to not include references to QCDOU and Pocket forecasting, but also is a test to make sure that I have PR access for the repo.