* refactored base_forecast and prophet_forecast to enable easier testing
* Apply suggestions from code review
change signatures of `fit` and `predict` to take arguments that default to attributes
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
* add test for fit
* revert signatures
* made timezone-aware stamps naive
* finished base_forecast tests
* added tests for prophet class
* linting
* fixed divide by zero
* linting again
* adding tests to funnel_forecast
* added tests for funnel_forecast
* feat(workday):remove unwanted fields (#249)
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
* fix(exit):Added sys.exit() call (#250)
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
* fix issue with call to _get_crossvalidation_metric
* fixed type check
* added string case to aggregate_to_period and added tests
* revert file
* added more tests to prophet_forecast
* removed DotMap
* modified README to make it match better between FunnelForecast and ProphetForecast
* Update jobs/kpi-forecasting/kpi_forecasting/models/base_forecast.py
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
* Brad easy fixes
* remove magic year
* removed DotMap
* modified README to make it match better between FunnelForecast and ProphetForecast
* added test for more complex segments
* renamed use_holidays to use_all_us_holidays
* typo
* added detail to prophet parameter descriptions
* updated setting of default start date and added tests
* remove print
* moved filter and updated tests to relfect this
---------
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
Co-authored-by: JCMOSCON1976 <167822375+JCMOSCON1976@users.noreply.github.com>
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
Co-authored-by: m-d-bowerman <mbowerman@mozilla.com>
* refactored base_forecast and prophet_forecast to enable easier testing
* Apply suggestions from code review
change signatures of `fit` and `predict` to take arguments that default to attributes
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
* add test for fit
* revert signatures
* made timezone-aware stamps naive
* finished base_forecast tests
* added tests for prophet class
* linting
* fixed divide by zero
* linting again
* adding tests to funnel_forecast
* added tests for funnel_forecast
* feat(workday):remove unwanted fields (#249)
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
* fix(exit):Added sys.exit() call (#250)
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
* fix issue with call to _get_crossvalidation_metric
* fixed type check
* added string case to aggregate_to_period and added tests
* revert file
* added more tests to prophet_forecast
* Update jobs/kpi-forecasting/kpi_forecasting/models/base_forecast.py
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
* Brad easy fixes
* remove magic year
* feat(code):increasing the max_limit from 10 to 40. (#259)
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
* typo
* revert bugfix in _add_regressors
* update tests to reflect reversion
---------
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
Co-authored-by: JCMOSCON1976 <167822375+JCMOSCON1976@users.noreply.github.com>
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
Co-authored-by: m-d-bowerman <mbowerman@mozilla.com>
* refactored base_forecast and prophet_forecast to enable easier testing
* Apply suggestions from code review
change signatures of `fit` and `predict` to take arguments that default to attributes
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
* add test for fit
* revert signatures
* made timezone-aware stamps naive
* finished base_forecast tests
* added tests for prophet class
* linting
* fixed divide by zero
* linting again
* adding tests to funnel_forecast
* added tests for funnel_forecast
* feat(workday):remove unwanted fields (#249)
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
* fix(exit):Added sys.exit() call (#250)
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
* fix issue with call to _get_crossvalidation_metric
* fixed type check
* added string case to aggregate_to_period and added tests
* revert file
* typo
* revert bugfix in _add_regressors
* update tests to reflect reversion
---------
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
Co-authored-by: JCMOSCON1976 <167822375+JCMOSCON1976@users.noreply.github.com>
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
Co-authored-by: m-d-bowerman <mbowerman@mozilla.com>
* fix(code):Fix the num_changes logic that was causing false deletions
* fix(code):Remove comments.
* fix(code):Add comments
---------
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
* feat(code):added max_limit parameter and more logs
* feat(code):added limit parameter and logs
---------
Co-authored-by: Julio Cezar Moscon <jcmoscon@gmail.com>
It turns out there's a 10MB limit when inserting data into BigQuery, and
we can sometimes exceed that. The chunk size is set to 25000 because
~50k rows was about 11MB of json, so halving that seems safe.
This also upgrades to Python 3.12 as that's when `itertools.batched` was
introduced. Possibly a silly reason to upgrade, but why not.
* refactored base_forecast and prophet_forecast to enable easier testing
* Apply suggestions from code review
change signatures of `fit` and `predict` to take arguments that default to attributes
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
* add test for fit
* revert signatures
* made timezone-aware stamps naive
* finished base_forecast tests
* added tests for prophet class
* linting
* fixed divide by zero
* linting again
* choose less confusing dates for tests and leave a comment
---------
Co-authored-by: Brad Ochocki Szasz <bochocki@mozilla.com>
* Remove click dependency.
This doesn't really save any code vs argparse in this case.
* Fix logging and enable command line configuration of level
* Record history of metric scores
To enable us to see what the metrics looked like previously, run a job
that reads the metrics once per day and appends the results to a
history table.
---------
Co-authored-by: Ksenia <kberezina@mozilla.com>
* updating prod ppa collector to support more time precision options
* variable names for start/end of collection periods, split out functions to confirm collection date and task are valid
This ended up being a bit more complicated than I wanted, but since I
had all the types of the records defined in the dataclass, I didn't want
to hard code them all just for a BigQuery schema. So instead this uses
the `generate_schema` function to automatically derive the table schema
from the `Record` classes.
Was it worth doing this way? Probably not. But it works well, so I think
it's worth keeping. This code can be re-used for future tables as well.
* feat: add class to do validation and module to run it
* replaced black with ruff and updated files
* ruff updates
* update circle ci configs
* revert configs and set config to overwrite existing data
* add tests file with one test
* added aggegation_period to groupby and join
* added more tests
* renamed validator to result_processing
* added to and cleaned up README
* fixes for name change
* fix ambiguous wording
* update modelperformanceanalysis class and rename
* finished tests and made small updates
* modify code to not create client when output project is empty string
* fixing ci
* add more to query_cte docstring
* refactored model_performance_analysis.py to take a config file
* update configs to use real names
* renamed model to forecast
* Jeff Changes
We need a date field to use for partitioning. Apparently the standard
name for this is 'submission_date'. So add this to all records across
all tables.
Bug: 1904928
* fix(fxci): make bigquery table names configurable
The tables are defined and versioned in bigquery-etl. Make the names
configurable so we don't need to rebuild the image when they change.
* fix(fxci): don't assume the last run is the one that generated the pulse event
While testing, I discovered that it's possible for a pulse event to
contain runs later than the one that generated the event. So it's not
safe to simply grab the last run in the list.
Instead we should use the run from the index in the top-level `runId` field.
* fix(fxci): skip tasks that hit 'deadline-exceeded'
* Import bugzilla bugs by use
Rather than importing all the Web Compat product bugs in one query and
then filtering the bugs in code, try to match the queries to the kind
of bug, to reduce the amount of filtering code required.
* Import webcompat:site-report bugs
These are added to those in the Web Compatibility::Site Reports component.
* Add mypy type annotations
This also changes the way we store bugs from lists to dictionaries
mapping bug id to the bug, which reflects the fact that bugs should be
unique by id and we want to look them up.
* Don't set start_time as a class property
* Add mypy type checker
* Update CI config
* Move to returning fetch status from the function
The advantage over setting a class variable is that we aren't relying
on state, and can depend on mypy to validate that we always check the
returned value and take appropriate action.
* Add breakage_reports entry for kb entries that are also site reports
If a core bug has both `webcompat:platform-bug` and also
`webcompat:site-report` as keywords it will already be considered a kb
entry and a site report, but needs an entry pointing to itself in the
breakage_reports table.
* Apply suggestions from code review
Co-authored-by: Ksenia <kberezina@mozilla.com>
---------
Co-authored-by: Ksenia <kberezina@mozilla.com>
Co-authored-by: Anna Scholtz <anna@scholtzan.net>
* handle secrets differently
* removed secret params from main method signature
* simpler adjustment to read values from env vars, still use @click & specify the envvar name for the couple of values