Time Series Forecasting Best Practices & Examples
Перейти к файлу
Chenhui Hu 0607fd568f First Release of Forecasting Repo (#181)
* Handled edge case where ts_id_col_names is None

* Split long line into separate lines

* Added notebook template

* Added a test yml file

* Added yml file for python unit test pipeline

* Minor update

* Minor update

* Minor update

* Minor update

* Removed triggers

* Removed triggers

* Created a base ts estimator and inherit BaseTSFeaturizer from the BaseTSEstimator.

* Refactored featurizer class hierachy.

* Added week of month method.

* add script to source entire

* formatting

* source only test files

* Inherit temporal featurizers from BaseTSFeaturizer.

* Minor update.

* Replaced max_test_timestamp with max_horizon

* Refactored rolling window featurizers.

* Renamed hour_of_year feature to normalized_hour_of_year

* Inherit all normalizers from base normalizer class.

* address review comments for the PR of contributing

* minor update

* address review comments for PR of r test pipeline

* add a test yml file

* Remove checking target column existence, because testing data may not have the target column.

* Create setter and getter of ts_id_col_names.

* Fixed bug caused by unexpected behavior of pandas.shift

* Some code cleanup.

* Updated some featurizer names.

* Some minor changes in df_config and feature configs.

* Some minor changes in feature names.

* Added usage examples in docstring.

* Computation time update after feature engineering refactoring.

* Removed setting frequency.

* Added docstring to convert_to_tsdf function.

* Removed frequency in convert_to_tsdf call.

* Fixed week_of_month function.

* Added popularity featurizer

* Added utility function for checking Iterable but not string.

* Updated LightGBM feature engineering code to use new feature engineering classes.

* Improved checking whether input column names are Iterable and conver to list.

* Made future_value_available a read-only property.

* Minor docstring update.

* Removed extra space in docstring examples.

* Made some methods staticmethods.

* Minor QRF result update after feature engineering code change.

* Removed calling of validate_file and added catching of the exception

* Update python_unit_tests_base.yml for Azure Pipelines [skip ci]

Updated path of the test results

* Test if the download link is wrong

* Fixed minor format issues.

* Fixed minor format issues.

* Fixed formatting issues.

* Fixed line length.

* Removed data files before downloading and checked dimensions of energy data

* Removed the change made for testing

* Changed folder structure of tests and added table to show build status

* Added missing files

* Updated based on review comments

* new folder structure

* add repo metrics

* remove prototypes folder

* add models placeholder

* adjust featurizers to the new structure of folders

* changes in README and evaluation files

* adjust data download to new folders

* delete unnecessary files

* energy load baseline model with new folders

* delete data files

* fix links in benchmarks file

* fix bug

* adjust GBM, QRF and FNN submissions to the new folder structure

* Replace pd.to_timedelta with pd.offsets.

* Added get_offset_by_frequency helper function.

* fix small bugs

* fix small bugs

* Update TSCVSplitter.

* refactored high-level folders

* added a placeholder folder for PR/issue templates

* added subfolders under notebooks/

* updated tests folder

* renamed notebooks/ to examples/

* Update to CONTRIBUTING instructions (#34)

* style checking and formatting files

* git hook installation guide

* issue and PR templates

* minor change

*  working with github instructions

* added specific issue templates

* addressed PR comments

* addressed Chenhui's comment

* addressing chenhuis comments

* conda environment file (#36)

* conda environment file

* updated environment file

* updated instructions for installing conda env

* Vapaunic/lib (#37)

* initial core for forecasting library

* syncing with new structure

* __init__ files in modules

* renamed lib directory

* Added legal headers and some formatting of py files

* restructured benchmarking directory in lib

* fixed imports, warnings, legal headers

* more import fixes and legal headers

* updated instructions with package installation

* barebones library README

* moved energy benchmark to contrib

* formatting changes plus more legal headers

* Added license to the setup

* moved .swp file to contrib, not sure we need to keep it at all

* added missing headers and a brief snipet to README file

* minor wording change in readme

* Chenhui/cpu unit test pipeline (#38)

* address review comments

* added full conda path

* minor change

* added conda to PATH

* added build status in README

* removed energy data prep placeholder notebook

* moved out data energy explore notebook into contrib

* moved data download script to tools/

* Added getting started section to readme

* Added rbase and rbayesm to conda environment

* modified data download script

* added instructions for data download

* renamed data download script

* fixing issues with test pipeline

* parsing issue in yml file

* cleaning up ci test yaml file for more diagnostic info

* fixed a missing argument in instructions

* removed retail directory under dataset module

* moved feature_engineering.py to the feature engineering module

* moved evaluate.py to evaluation module

* combined benchmark settings into a single file

* moved download sript to the package and modified the tests

* modified instructions

* fixed the build pipeline yml

* fix to the pipeline yml

* fix to the pipeline yml

* moved serve_folds into ojdata.py

* removed data_schema.py file as all content moved to ojdata.py

* fixed split_train_test in ojdata.py

* moved retail_data_schema into ojdata.py

* moved all oj utilities to ojdata.py

* removed paths from benchmark_settings

* fixed up a docstring

* quick fix a typo

* removed benchmark_settings

* parameterized experiment settings

* refactored experiment settings

* Fixed docstrings

* addressed chenhuis comment around round file naming

* renamed experiment to forecast settings

* Chenhui/light gbm quick start (#40)

* initial example notebook for lightgbm

* reduced to one round forecast

* added text

* added text

* added text

* moved week_of_month to feature engineering utils

* moved df_from_cartesian_product to feature utils

* moved functions to feature utils

* moved functions to feature utils

* added lightgbm model utils

* updated plots

* added text and renamed predict function

* reduced print out frequency in model training

* moved data visualization code to utils

* added text

* updated plot function and added docstring

* renamed the notebook

* updated text

* added NOTICE file, currently empty as we're not redistributing any packages

* Chenhui/add scrapbook (#43)

* added scrapbook support

* Added gitpython to environtment.yml file

* added git_repo_path function to utils

* updated notebook

* added test for lightgbm notebook

* included testing of notebooks

* resolve test error

* resolve test error

* added kernel name

* updated kernel name

* trying installing bayesm from cmd

* trying installing bayesm from cmd

* trying installing bayesm from cmd

* excluded notebook test

* excluded notebook test

* added lapack.so link fix

* included notebook tests

* excluded files for notebook test

Co-authored-by: vapaunic <15053814+vapaunic@users.noreply.github.com>

* added integration test

* added initial data prep notebook

* updated notebook

* updated notebook

* updated notebook

* updated url

* init

* model parameters

* removed blank quick start notebooks

* removed blank modeling notebooks

* removed blank evaluation notebooks

* Removed blank model selection notebooks

* removed blank o16n notebooks

* removed outdated text from contrib/README

* removed outdated swp file

* updating .gitignore

* removed change log, as we don't plan to maintain this

* Excluding irrelevant directories

* fix settings

* separated out the setup guide

* fix settings

* simplemodel init

* typo

* add rproj file

* Renaming forecasting_lib to fclib (#59)

* renamed forecasting_lib directory

* modified references to forecasting_lib

* Vapaunic/envname (#61)

* renamed conda env

* modified setup instructions

* minor change in contributing guide

* keep top-level gitignore only

* formatting fixes

* Chenhui/add automl example (#62)

* added multiple linear models and example notebook for AutoML

* removed commented code

* address review comments

* minor update to the notebook

* minor update to the notebook

* added text

* changed types in lightgbm to be consistent with the rest of the code

* modified docstrings in multiple_linear_regression.py

* updated ci yaml files

* changed import statement in confest.py

* updated gitpython version to the latest

Co-authored-by: vapaunic <15053814+vapaunic@users.noreply.github.com>

* Vapaunic/split bug (#65)

* fixed a yield bug

* removed two blank files

* modified split data function to auto-calculate the splits based on the parameters

* removed forecast_settings module

* removed unused parameter

* modified splitting function to use non-overlapping testing

* tested the split function after the update

* minor fix

* defaults changed in split function

* modified lightgbm example with new split function

* modified automl example (needs verification)

* modified data explore notebook

* quick fix:

* updated data preparation notebook

* changed defaults in split function

* Addressed changes in lightgbm

* addressed issues in automl notebook

* fixed typo in lightgbm plot

* first images of time series split

* updated the pictures

* updated evaluation periods (#66)

* Chenhui/env setup script (#67)

* added a shell script for setting up environment

* changed yaml to yml

* added comments and updated SETUP.md

* modified data preparation notebook with images

* moved r exploration notebook to contrib directory

* modified data explore notebook, updated info about the data, and removed reference to TSPerf

* addressed review feedback and fixed the explore notebook

* Chenhui/multiround lightgbm (#68)

* added initial multiround notebook for lightgbm

* updated data splitting

* updated text

* updated week list

* addressed review comments

* added pyramid-automl to conda file

* first draft of arima notebook

* replace pyramid with pmdarima

* Added a complete function

* minor type

* forecasting across many stores/brands

* complete arima notebook

* renamed data preparation/exploration notebooks

* added git clone to setup

* addressed PR comments

* typo

* Arima to ARIMA

* fixed docstring in plot function

* fixed a bug in MAPE calculation and added plotting

* fixed a bug in predict

* modeling arima on log scale

* Fixing AML Example Notebook (#84)

* Cleaning notebook output, adding get_or_create workspace call, and fixing get_or_create AmlCompute

* Add regression-based models (#64)

* modelling updates

* code tweak

* rebuild

* update mape

* update mape 2

* new forecasting structure

* update eval

* rebuild dataprep

* rebuild with profit

* rm profit

* add plot

* typo

* tidy up

* expand readme

* oops

* clarified setup guide (#94)

* Update SETUP (#95)

minor fix

* Cleaned up unused files and directories (#96)

* removed non-used files

* moved docs into a docs/ dir

* fixed broken links

* Chenhui/dilated cnn example and utils (#76)

* added initial model util file for DCNN

* initial notebook

* added feature utils for DCNN

* upadted evaluation and visualization

* removed plot function

* replaced PRED_HORIZON, PRED_STEPS by HORIZON, GAP

* removed log dir if it exists

* updated model utils

* generalized categorical features in dcnn model util

* generalized network definition

* update training code

* format with blackcellmagic

* address review comments and added README

* Chenhui/add ci tests (#146)

* Update conda env with versions (#99)

* 💥

* revert

* minor changes

Co-authored-by: Chenhui Hu <chenhhu@microsoft.com>

* Adding missing Jupyter Extension (#90)

* Update environment.yml

* specified version

Co-authored-by: Chenhui Hu <chenhhu@microsoft.com>

* fix links to examples/ (#104)

* Chenhui/rename notebooks and update automl notebook (#106)

* removed unused module

* added outputs in automl notebook

* fixed a notebook name

* Arima multi-round notebook (#91)

* working arima model

* final auto arima example

* added tqdm to requirements

* addressed review comments

* Revert "Chenhui/rename notebooks and update automl notebook (#106)" (#107)

This reverts commit 032c91d9bfa389f22ae1f1f2150913a4f063bd18 [formerly 15d25213dc].

Co-authored-by: Chenhui Hu <chenhhu@microsoft.com>

* Fixing data download issue (#109)

* removed dependency on __file__ from data download, doesn't work in jupyter

* changed aux to auxdata

* fixe data download function

* fixed path

* auxdata -> auxi

* adding tl;dr directions for setup to README.md (#88)

* adding tl;dr directions for setup to README.md

* added a bit more text

* Cleaned up obsolete (tsperf) code in fclib (#112)

* moved out tsperf files from evaluation module

* moved out tsperf tuning code

* removed more unused files

* Addressing documentation related issues (#111)

* Added conda activate to the setup readme

* added instructions for starting jupyter to setup

* minor

* deleted duplicate instructions

* addressed PR comments

* Chenhui/rename notebooks and updated AutoML example (#108)

* removed unused module

* added outputs in automl notebook

* fixed a notebook name

* updated pytest file

* address review comments

* reran notebook with blackcellmagic

* adding pylint  (#93)

* adding tl;dr directions for setup to README.md

* removing pylint hook and pylint_junit from the env file

* removed pylint config file

* Chenhui/update example folder (#115)

* restructure examples folder

* updated readme

* added readme

* minor update

* removed R folder

* minor change

* fixed a broken link

* another broken link

* fixing notebook tests

* Chenhui/fix aux file path (#118)

* fixed figure links

* changed to auxi_i.csv

* minor change

* [MINOR] Small changes to Arima notebooks (#121)

* fixed a broken link

* minor text changes

* Documentation (#120)

* added target audience section

* added intro on forecasting

* Added fclib documentation

* improved examples readme

* address comments

* added info about the dataset

* added items to be ignored (#123)

* added items to be ignored

* added *.log and score.py

* Chenhui/toplevel readme (#127)

* added content table

* added references

* added external repo links

* minor update

* Chenhui/tune deploy lgbm (#122)

* added notebook and utils

* updated readme links

* fix data path

* updated text

* group imports

* minor update

* using azureml utils to create workspace and compute (#126)

* using azureml utils to create workspace and compute

* group imports

* Download ojdata directly from github (#128)

* new function to download and load oj data directly from bayesm repo

* removed bayesm

* new R function to only load the data

* removed download R function

* minor fix

* added documentation to load_oj_data.R

* added requests to requirements

* fixed a syntax error (#130)

* fix setup.md link (#129)

* fix setup.md link

* mention related use cases

* Vapaunic/cgbuild (#133)

* added files to generate reqs.txt and the ci yml file

* Added notice generation task

* Checking if notice is there

* Update component_governance.yml for Azure Pipelines

* check in notice file

* Update component_governance.yml for Azure Pipelines

* fixed heading

* Chenhui/windows setup (#131)

* initial test

* added batch script and instructions

* align image to center

* adjust image size

* added text

* adjust image size

* address comments

* Readds R material (#116)

* redo R stuff in new dirs

* dirname fixup

* add Rproj file

* rebuild

* fixups

* roxygenise

* copyright notice

* dataprep

* updated yaml

* more updates

* more tweaks

* reg models

* update reg models

* more updates

* reword

* rendered prophet html

* name fix

* add lintr file

* move stuff

* renamed use case folder (#138)

* renamed use case folder

* dirname change

* updated readme

* added notebooks

* fix ci test

* Vapaunic/featutils (#137)

* moved feature engineering module to contrib

* removed lag submod

* cleaned up feature engineering

* rebuild R notebooks (#139)

* Chenhui/toplevel readme (#140)

* added content table

* added references

* added external repo links

* minor update

* updated setup instructions

* added text

* align text

* removed duplicated Content section

* address review comments

* Chenhui/hyperdrive example update (#142)

* removed blackcellmagic

* removed utils under aml_scripts and updated notebook

* added notebook path

* added ci test of lightgbm multi round example

* make forecast round as parameter

* Make -Agent Name

* resolve duplicated function name

* increased time limit and reduce number of rounds

* increase time limit

* added parameters tag to multiround lightgbm and dilatedcnn

* README change (#147)

* minor change

* hide tags

* hide tags

* added parameters tag

* Revert "Chenhui/add ci tests (#146)" (#149)

This reverts commit de7a19cfa7637476b9ebfc92f5c18a26a8eca4da [formerly f8bd22733c].

* Chenhui/add ci tests (#150)

* Update conda env with versions (#99)

* 💥

* revert

* minor changes

Co-authored-by: Chenhui Hu <chenhhu@microsoft.com>

* Adding missing Jupyter Extension (#90)

* Update environment.yml

* specified version

Co-authored-by: Chenhui Hu <chenhhu@microsoft.com>

* fix links to examples/ (#104)

* Chenhui/rename notebooks and update automl notebook (#106)

* removed unused module

* added outputs in automl notebook

* fixed a notebook name

* Arima multi-round notebook (#91)

* working arima model

* final auto arima example

* added tqdm to requirements

* addressed review comments

* Revert "Chenhui/rename notebooks and update automl notebook (#106)" (#107)

This reverts commit 032c91d9bfa389f22ae1f1f2150913a4f063bd18 [formerly 15d25213dc].

Co-authored-by: Chenhui Hu <chenhhu@microsoft.com>

* Fixing data download issue (#109)

* removed dependency on __file__ from data download, doesn't work in jupyter

* changed aux to auxdata

* fixe data download function

* fixed path

* auxdata -> auxi

* adding tl;dr directions for setup to README.md (#88)

* adding tl;dr directions for setup to README.md

* added a bit more text

* Cleaned up obsolete (tsperf) code in fclib (#112)

* moved out tsperf files from evaluation module

* moved out tsperf tuning code

* removed more unused files

* Addressing documentation related issues (#111)

* Added conda activate to the setup readme

* added instructions for starting jupyter to setup

* minor

* deleted duplicate instructions

* addressed PR comments

* Chenhui/rename notebooks and updated AutoML example (#108)

* removed unused module

* added outputs in automl notebook

* fixed a notebook name

* updated pytest file

* address review comments

* reran notebook with blackcellmagic

* adding pylint  (#93)

* adding tl;dr directions for setup to README.md

* removing pylint hook and pylint_junit from the env file

* removed pylint config file

* Chenhui/update example folder (#115)

* restructure examples folder

* updated readme

* added readme

* minor update

* removed R folder

* minor change

* fixed a broken link

* another broken link

* fixing notebook tests

* Chenhui/fix aux file path (#118)

* fixed figure links

* changed to auxi_i.csv

* minor change

* [MINOR] Small changes to Arima notebooks (#121)

* fixed a broken link

* minor text changes

* Documentation (#120)

* added target audience section

* added intro on forecasting

* Added fclib documentation

* improved examples readme

* address comments

* added info about the dataset

* added items to be ignored (#123)

* added items to be ignored

* added *.log and score.py

* Chenhui/toplevel readme (#127)

* added content table

* added references

* added external repo links

* minor update

* Chenhui/tune deploy lgbm (#122)

* added notebook and utils

* updated readme links

* fix data path

* updated text

* group imports

* minor update

* using azureml utils to create workspace and compute (#126)

* using azureml utils to create workspace and compute

* group imports

* Download ojdata directly from github (#128)

* new function to download and load oj data directly from bayesm repo

* removed bayesm

* new R function to only load the data

* removed download R function

* minor fix

* added documentation to load_oj_data.R

* added requests to requirements

* fixed a syntax error (#130)

* fix setup.md link (#129)

* fix setup.md link

* mention related use cases

* Vapaunic/cgbuild (#133)

* added files to generate reqs.txt and the ci yml file

* Added notice generation task

* Checking if notice is there

* Update component_governance.yml for Azure Pipelines

* check in notice file

* Update component_governance.yml for Azure Pipelines

* fixed heading

* Chenhui/windows setup (#131)

* initial test

* added batch script and instructions

* align image to center

* adjust image size

* added text

* adjust image size

* address comments

* Readds R material (#116)

* redo R stuff in new dirs

* dirname fixup

* add Rproj file

* rebuild

* fixups

* roxygenise

* copyright notice

* dataprep

* updated yaml

* more updates

* more tweaks

* reg models

* update reg models

* more updates

* reword

* rendered prophet html

* name fix

* add lintr file

* move stuff

* renamed use case folder (#138)

* renamed use case folder

* dirname change

* updated readme

* added notebooks

* fix ci test

* Vapaunic/featutils (#137)

* moved feature engineering module to contrib

* removed lag submod

* cleaned up feature engineering

* rebuild R notebooks (#139)

* Chenhui/toplevel readme (#140)

* added content table

* added references

* added external repo links

* minor update

* updated setup instructions

* added text

* align text

* removed duplicated Content section

* address review comments

* Chenhui/hyperdrive example update (#142)

* removed blackcellmagic

* removed utils under aml_scripts and updated notebook

* added notebook path

* added ci test of lightgbm multi round example

* make forecast round as parameter

* Make -Agent Name

* resolve duplicated function name

* increased time limit and reduce number of rounds

* increase time limit

* added parameters tag to multiround lightgbm and dilatedcnn

* README change (#147)

* minor change

* hide tags

* hide tags

* added parameters tag

* Revert "Chenhui/add ci tests (#150)" (#151)

This reverts commit 357453234088f2ebb8453bd8cd77527a1c6c2130 [formerly 21846168a7].

* Chenhui/Add CI tests for notebooks

This reverts commit 8a99549da8b9096b65130fd2f6634e2a217b2dd9 [formerly 89e986fe2c].

* minor update

* Added CI tests for example notebooks

* Update component governance pipeline

* Update component governance pipeline

* add ignored items

* Readds R material (#116)

* Chenhui/windows setup (#131)

* Vapaunic/featutils (#137)

* Chenhui/add CI tests for notebooks

* Vapaunic/arimaint (#154)


* modified conftests to add arima

* added tests

* modified notebooks with parameters

* Chenhui/code improvments (#157)

* updated docstring

* pinged package versions

* minor improvements

* minor improvement

* modified metrics to take any iterable (#158)

* improvement: using Ray to parallelize arima fitting (#159)

* using Ray to parallelize arima fitting

* added ray as dependency

* text about ray, disable warnings, and minor stuff

* scipy 1.4.1 or above

* reverting scipy, azuremlsdk issue

* minor mod

Co-authored-by: Vanja Paunic <15053814+vapaunic@users.noreply.github.com>

* chenhui/improve ray output (#166)

* modified arima multiround to run with ray (#167)

* Chenhui/improve doc (#168)

* minor changes

* remove redundancy

* updated text

* improved text in model tuning and deployment notebook

* clarify the data used

* updated text

* added description of the script

* add explanation of gaps in the curve

* add explanation of gaps in the curve

* updated text

* fix typos

* improve documentation and format

* Addressing a few issues around package dependencies (#169)

* syncronizing utils with other OSS AI repos

* exclude xlrd, leftover from tsperf

* exclude urlib3, leftover from tsperf

* moving tqdm to fclib as only used by lib at the moment

* included fclib dependencies in requirements.txt

* lower bounded package versions that we dont need specific versions of

* lower bound gitpython

* Chenhui/improve checking of run completion (#170)

* Chenhui/added ray dashboard (#171)

* Chenhui/update diagram (#172)

* update multiround training diagram

* minor change

* update diagram and minor change

* Addressing doc related issues (#173)

* taking out inventory optimization link

* pulled contributing out of docs

* Chenhui/ray windows (#177)

* add util to check if module exists

* use ray if available or use sequential training

* updated text

* updated text

* reduce code redundancy

* Chenhui/setup scripts (#178)

* move ray to linux setup script

* remove duplicated azureml-sdk to avoid errors

* add ray to ci yaml files

* update azureml-sdk

* update manual setup instructions

* minor change

* Chenhui/content table (#179)

* update readme

* minor change

* minor update

* Chenhui/multiround arima (#180)

* use ray if it is installed

* update text and reran notebook

* add reference

* Chenhui/dilatedcnn windows (#184)

* resolve format issues

* update log path and tensorboard path

* remove subprocess import

* fix path

* change env name to resolve pipeline failures

* Chenhui/hyperdrive windows (#185)

* resolve format issues

* update log path and tensorboard path

* remove subprocess import

* fetch common utils from chenhui/dilatedcnn_windows

* update notebook

* removed explain module and added notebooks module

* get updated ci yml files

* updated kernel name

* Chenhui/enhancement (#186)

* modified module_path

* updated tensorboard section

* rerun notebook

* only submit local run if python path is found

* minor change and rerun notebook

* updated content section (#187)

* updated content section

* minor change

* address comments

* add links

Co-authored-by: Hong Lu <honglu@microsoft.com>
Co-authored-by: ZhouFang928 <ZhouFang928@users.noreply.github.com>
Co-authored-by: pechyony <pechyony@outlook.com>
Co-authored-by: Ubuntu <chenhui@chhdsvmnc6.hyjxgt1qggauhj0g0g2jh3guwb.bx.internal.cloudapp.net>
Co-authored-by: vapaunic <15053814+vapaunic@users.noreply.github.com>
Co-authored-by: Hong Ooi <hongooi@microsoft.com>
Co-authored-by: Daniel Ciborowski <dciborow@microsoft.com>
Co-authored-by: Markus Cozowicz <marcozo@microsoft.com>
Former-commit-id: 6098ecf68c
2020-04-06 16:17:18 -04:00
.github First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
R_utils First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
assets First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
contrib First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
docs First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
examples First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
fclib First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
tests First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
tools First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
.flake8 First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
.gitignore First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
.lintr First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
.pre-commit-config.yaml First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
CONTRIBUTING.md First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
LICENSE First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
NOTICE.txt First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
README.md First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
codeofconduct.md First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
forecasting.Rproj First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00
pyproject.toml First Release of Forecasting Repo (#181) 2020-04-06 16:17:18 -04:00

README.md

Forecasting Best Practices

Time series forecasting is one of the most important topics in data science. Almost every business needs to predict the future in order to make better decisions and allocate resources more effectively.

This repository provides examples and best practice guidelines for building forecasting solutions. The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in forecasting algorithms to build solutions and operationalize them. Rather than creating implementations from scratch, we draw from existing state-of-the-art libraries and build additional utilities around processing and featurizing the data, optimizing and evaluating models, and scaling up to the cloud.

The examples and best practices are provided as Python Jupyter notebooks and R markdown files and a library of utility functions. We hope that these examples and utilities can significantly reduce the “time to market” by simplifying the experience from defining the business problem to the development of solutions by orders of magnitude. In addition, the example notebooks would serve as guidelines and showcase best practices and usage of the tools in a wide variety of languages.

Content

The following is a summary of models and methods for developing forecasting solutions covered in this repository. The examples are organized according to use cases. Currently, we focus on a retail sales forecasting use case as it is widely used in assortment planning, inventory optimization, and price optimization. To enable high-throughput forecasting scenarios, we have included examples for forecasting multiple time series with distributed training techniques such as Ray in Python, parallel package in R, and multi-threading in LightGBM.

Model Language Description
Auto ARIMA Python Auto Regressive Integrated Moving Average (ARIMA) model that is automatically selected
Linear Regression Python Linear regression model trained on lagged features of the target variable and external features
LightGBM Python Gradient boosting decision tree implemented with LightGBM package for high accuracy and fast speed
DilatedCNN Python Dilated Convolutional Neural Network that captures long-range temporal flow with dilated causal connections
Mean Forecast R Simple forecasting method based on historical mean
ARIMA R ARIMA model without or with external features
ETS R Exponential Smoothing algorithm with additive errors
Prophet R Automated forecasting procedure based on an additive model with non-linear trends

The repository also comes with AzureML-themed notebooks and best practices recipes to accelerate the development of scalable, production-grade forecasting solutions on Azure. In particular, we have the following examples for forecasting with Azure AutoML as well as tuning and deploying a forecasting model on Azure.

Method Language Description
Azure AutoML Python AzureML service that automates model development process and identifies the best machine learning pipeline
HyperDrive Python AzureML service for tuning hyperparameters of machine learning models in parallel on cloud
AzureML Web Service Python AzureML service for deploying a model as a web service on Azure Container Instances

Getting Started in Python

To quickly get started with the repository on your local machine, use the following commands.

  1. Install Anaconda with Python >= 3.6. Miniconda is a quick way to get started.

  2. Clone the repository

    git clone https://github.com/microsoft/forecasting
    cd forecasting/
    
  3. Run setup scripts to create conda environment. Please execute one of the following commands from the root of Forecasting repo based on your operating system.

    • Linux
    ./tools/environment_setup.sh
    
    • Windows
    tools\environment_setup.bat
    

    Note that for Windows you need to run the batch script from Anaconda Prompt. The script creates a conda environment forecasting_env and installs the forecasting utility library fclib.

  4. Start the Jupyter notebook server

    jupyter notebook
    
  5. Run the LightGBM single-round notebook under the 00_quick_start folder. Make sure that the selected Jupyter kernel is forecasting_env.

If you have any issues with the above setup, or want to find more detailed instructions on how to set up your environment and run examples provided in the repository, on local or a remote machine, please navigate to the Setup Guide.

Getting Started in R

We assume you already have R installed on your machine. If not, simply follow the instructions on CRAN to download and install R.

The recommended editor is RStudio, which supports interactive editing and previewing of R notebooks. However, you can use any editor or IDE that supports RMarkdown. In particular, Visual Studio Code with the R extension can be used to edit and render the notebook files. The rendered .nb.html files can be viewed in any modern web browser.

The examples use the Tidyverts family of packages, which is a modern framework for time series analysis that builds on the widely-used Tidyverse family. The Tidyverts framework is still under active development, so it's recommended that you update your packages regularly to get the latest bug fixes and features.

Target Audience

Our target audience for this repository includes data scientists and machine learning engineers with varying levels of knowledge in forecasting as our content is source-only and targets custom machine learning modelling. The utilities and examples provided are intended to be solution accelerators for real-world forecasting problems.

Contributing

We hope that the open source community would contribute to the content and bring in the latest SOTA algorithm. This project welcomes contributions and suggestions. Before contributing, please see our Contributing Guide.

Reference

The following is a list of related repositories that you may find helpful.

Deep Learning for Time Series Forecasting A collection of examples for using deep neural networks for time series forecasting with Keras.
Microsoft AI Github Find other Best Practice projects, and Azure AI designed patterns in our central repository.

Build Status

Build Branch Status
Linux CPU master Build Status
Linux CPU staging Build Status