0607fd568f
* Handled edge case where ts_id_col_names is None * Split long line into separate lines * Added notebook template * Added a test yml file * Added yml file for python unit test pipeline * Minor update * Minor update * Minor update * Minor update * Removed triggers * Removed triggers * Created a base ts estimator and inherit BaseTSFeaturizer from the BaseTSEstimator. * Refactored featurizer class hierachy. * Added week of month method. * add script to source entire * formatting * source only test files * Inherit temporal featurizers from BaseTSFeaturizer. * Minor update. * Replaced max_test_timestamp with max_horizon * Refactored rolling window featurizers. * Renamed hour_of_year feature to normalized_hour_of_year * Inherit all normalizers from base normalizer class. * address review comments for the PR of contributing * minor update * address review comments for PR of r test pipeline * add a test yml file * Remove checking target column existence, because testing data may not have the target column. * Create setter and getter of ts_id_col_names. * Fixed bug caused by unexpected behavior of pandas.shift * Some code cleanup. * Updated some featurizer names. * Some minor changes in df_config and feature configs. * Some minor changes in feature names. * Added usage examples in docstring. * Computation time update after feature engineering refactoring. * Removed setting frequency. * Added docstring to convert_to_tsdf function. * Removed frequency in convert_to_tsdf call. * Fixed week_of_month function. * Added popularity featurizer * Added utility function for checking Iterable but not string. * Updated LightGBM feature engineering code to use new feature engineering classes. * Improved checking whether input column names are Iterable and conver to list. * Made future_value_available a read-only property. * Minor docstring update. * Removed extra space in docstring examples. * Made some methods staticmethods. * Minor QRF result update after feature engineering code change. * Removed calling of validate_file and added catching of the exception * Update python_unit_tests_base.yml for Azure Pipelines [skip ci] Updated path of the test results * Test if the download link is wrong * Fixed minor format issues. * Fixed minor format issues. * Fixed formatting issues. * Fixed line length. * Removed data files before downloading and checked dimensions of energy data * Removed the change made for testing * Changed folder structure of tests and added table to show build status * Added missing files * Updated based on review comments * new folder structure * add repo metrics * remove prototypes folder * add models placeholder * adjust featurizers to the new structure of folders * changes in README and evaluation files * adjust data download to new folders * delete unnecessary files * energy load baseline model with new folders * delete data files * fix links in benchmarks file * fix bug * adjust GBM, QRF and FNN submissions to the new folder structure * Replace pd.to_timedelta with pd.offsets. * Added get_offset_by_frequency helper function. * fix small bugs * fix small bugs * Update TSCVSplitter. * refactored high-level folders * added a placeholder folder for PR/issue templates * added subfolders under notebooks/ * updated tests folder * renamed notebooks/ to examples/ * Update to CONTRIBUTING instructions (#34) * style checking and formatting files * git hook installation guide * issue and PR templates * minor change * working with github instructions * added specific issue templates * addressed PR comments * addressed Chenhui's comment * addressing chenhuis comments * conda environment file (#36) * conda environment file * updated environment file * updated instructions for installing conda env * Vapaunic/lib (#37) * initial core for forecasting library * syncing with new structure * __init__ files in modules * renamed lib directory * Added legal headers and some formatting of py files * restructured benchmarking directory in lib * fixed imports, warnings, legal headers * more import fixes and legal headers * updated instructions with package installation * barebones library README * moved energy benchmark to contrib * formatting changes plus more legal headers * Added license to the setup * moved .swp file to contrib, not sure we need to keep it at all * added missing headers and a brief snipet to README file * minor wording change in readme * Chenhui/cpu unit test pipeline (#38) * address review comments * added full conda path * minor change * added conda to PATH * added build status in README * removed energy data prep placeholder notebook * moved out data energy explore notebook into contrib * moved data download script to tools/ * Added getting started section to readme * Added rbase and rbayesm to conda environment * modified data download script * added instructions for data download * renamed data download script * fixing issues with test pipeline * parsing issue in yml file * cleaning up ci test yaml file for more diagnostic info * fixed a missing argument in instructions * removed retail directory under dataset module * moved feature_engineering.py to the feature engineering module * moved evaluate.py to evaluation module * combined benchmark settings into a single file * moved download sript to the package and modified the tests * modified instructions * fixed the build pipeline yml * fix to the pipeline yml * fix to the pipeline yml * moved serve_folds into ojdata.py * removed data_schema.py file as all content moved to ojdata.py * fixed split_train_test in ojdata.py * moved retail_data_schema into ojdata.py * moved all oj utilities to ojdata.py * removed paths from benchmark_settings * fixed up a docstring * quick fix a typo * removed benchmark_settings * parameterized experiment settings * refactored experiment settings * Fixed docstrings * addressed chenhuis comment around round file naming * renamed experiment to forecast settings * Chenhui/light gbm quick start (#40) * initial example notebook for lightgbm * reduced to one round forecast * added text * added text * added text * moved week_of_month to feature engineering utils * moved df_from_cartesian_product to feature utils * moved functions to feature utils * moved functions to feature utils * added lightgbm model utils * updated plots * added text and renamed predict function * reduced print out frequency in model training * moved data visualization code to utils * added text * updated plot function and added docstring * renamed the notebook * updated text * added NOTICE file, currently empty as we're not redistributing any packages * Chenhui/add scrapbook (#43) * added scrapbook support * Added gitpython to environtment.yml file * added git_repo_path function to utils * updated notebook * added test for lightgbm notebook * included testing of notebooks * resolve test error * resolve test error * added kernel name * updated kernel name * trying installing bayesm from cmd * trying installing bayesm from cmd * trying installing bayesm from cmd * excluded notebook test * excluded notebook test * added lapack.so link fix * included notebook tests * excluded files for notebook test Co-authored-by: vapaunic <15053814+vapaunic@users.noreply.github.com> * added integration test * added initial data prep notebook * updated notebook * updated notebook * updated notebook * updated url * init * model parameters * removed blank quick start notebooks * removed blank modeling notebooks * removed blank evaluation notebooks * Removed blank model selection notebooks * removed blank o16n notebooks * removed outdated text from contrib/README * removed outdated swp file * updating .gitignore * removed change log, as we don't plan to maintain this * Excluding irrelevant directories * fix settings * separated out the setup guide * fix settings * simplemodel init * typo * add rproj file * Renaming forecasting_lib to fclib (#59) * renamed forecasting_lib directory * modified references to forecasting_lib * Vapaunic/envname (#61) * renamed conda env * modified setup instructions * minor change in contributing guide * keep top-level gitignore only * formatting fixes * Chenhui/add automl example (#62) * added multiple linear models and example notebook for AutoML * removed commented code * address review comments * minor update to the notebook * minor update to the notebook * added text * changed types in lightgbm to be consistent with the rest of the code * modified docstrings in multiple_linear_regression.py * updated ci yaml files * changed import statement in confest.py * updated gitpython version to the latest Co-authored-by: vapaunic <15053814+vapaunic@users.noreply.github.com> * Vapaunic/split bug (#65) * fixed a yield bug * removed two blank files * modified split data function to auto-calculate the splits based on the parameters * removed forecast_settings module * removed unused parameter * modified splitting function to use non-overlapping testing * tested the split function after the update * minor fix * defaults changed in split function * modified lightgbm example with new split function * modified automl example (needs verification) * modified data explore notebook * quick fix: * updated data preparation notebook * changed defaults in split function * Addressed changes in lightgbm * addressed issues in automl notebook * fixed typo in lightgbm plot * first images of time series split * updated the pictures * updated evaluation periods (#66) * Chenhui/env setup script (#67) * added a shell script for setting up environment * changed yaml to yml * added comments and updated SETUP.md * modified data preparation notebook with images * moved r exploration notebook to contrib directory * modified data explore notebook, updated info about the data, and removed reference to TSPerf * addressed review feedback and fixed the explore notebook * Chenhui/multiround lightgbm (#68) * added initial multiround notebook for lightgbm * updated data splitting * updated text * updated week list * addressed review comments * added pyramid-automl to conda file * first draft of arima notebook * replace pyramid with pmdarima * Added a complete function * minor type * forecasting across many stores/brands * complete arima notebook * renamed data preparation/exploration notebooks * added git clone to setup * addressed PR comments * typo * Arima to ARIMA * fixed docstring in plot function * fixed a bug in MAPE calculation and added plotting * fixed a bug in predict * modeling arima on log scale * Fixing AML Example Notebook (#84) * Cleaning notebook output, adding get_or_create workspace call, and fixing get_or_create AmlCompute * Add regression-based models (#64) * modelling updates * code tweak * rebuild * update mape * update mape 2 * new forecasting structure * update eval * rebuild dataprep * rebuild with profit * rm profit * add plot * typo * tidy up * expand readme * oops * clarified setup guide (#94) * Update SETUP (#95) minor fix * Cleaned up unused files and directories (#96) * removed non-used files * moved docs into a docs/ dir * fixed broken links * Chenhui/dilated cnn example and utils (#76) * added initial model util file for DCNN * initial notebook * added feature utils for DCNN * upadted evaluation and visualization * removed plot function * replaced PRED_HORIZON, PRED_STEPS by HORIZON, GAP * removed log dir if it exists * updated model utils * generalized categorical features in dcnn model util * generalized network definition * update training code * format with blackcellmagic * address review comments and added README * Chenhui/add ci tests (#146) * Update conda env with versions (#99) * 💥 * revert * minor changes Co-authored-by: Chenhui Hu <chenhhu@microsoft.com> * Adding missing Jupyter Extension (#90) * Update environment.yml * specified version Co-authored-by: Chenhui Hu <chenhhu@microsoft.com> * fix links to examples/ (#104) * Chenhui/rename notebooks and update automl notebook (#106) * removed unused module * added outputs in automl notebook * fixed a notebook name * Arima multi-round notebook (#91) * working arima model * final auto arima example * added tqdm to requirements * addressed review comments * Revert "Chenhui/rename notebooks and update automl notebook (#106)" (#107) This reverts commit 032c91d9bfa389f22ae1f1f2150913a4f063bd18 [formerly |
||
---|---|---|
.github | ||
R_utils | ||
assets | ||
contrib | ||
docs | ||
examples | ||
fclib | ||
tests | ||
tools | ||
.flake8 | ||
.gitignore | ||
.lintr | ||
.pre-commit-config.yaml | ||
CONTRIBUTING.md | ||
LICENSE | ||
NOTICE.txt | ||
README.md | ||
codeofconduct.md | ||
forecasting.Rproj | ||
pyproject.toml |
README.md
Forecasting Best Practices
Time series forecasting is one of the most important topics in data science. Almost every business needs to predict the future in order to make better decisions and allocate resources more effectively.
This repository provides examples and best practice guidelines for building forecasting solutions. The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in forecasting algorithms to build solutions and operationalize them. Rather than creating implementations from scratch, we draw from existing state-of-the-art libraries and build additional utilities around processing and featurizing the data, optimizing and evaluating models, and scaling up to the cloud.
The examples and best practices are provided as Python Jupyter notebooks and R markdown files and a library of utility functions. We hope that these examples and utilities can significantly reduce the “time to market” by simplifying the experience from defining the business problem to the development of solutions by orders of magnitude. In addition, the example notebooks would serve as guidelines and showcase best practices and usage of the tools in a wide variety of languages.
Content
The following is a summary of models and methods for developing forecasting solutions covered in this repository. The examples are organized according to use cases. Currently, we focus on a retail sales forecasting use case as it is widely used in assortment planning, inventory optimization, and price optimization. To enable high-throughput forecasting scenarios, we have included examples for forecasting multiple time series with distributed training techniques such as Ray in Python, parallel package in R, and multi-threading in LightGBM.
Model | Language | Description |
---|---|---|
Auto ARIMA | Python | Auto Regressive Integrated Moving Average (ARIMA) model that is automatically selected |
Linear Regression | Python | Linear regression model trained on lagged features of the target variable and external features |
LightGBM | Python | Gradient boosting decision tree implemented with LightGBM package for high accuracy and fast speed |
DilatedCNN | Python | Dilated Convolutional Neural Network that captures long-range temporal flow with dilated causal connections |
Mean Forecast | R | Simple forecasting method based on historical mean |
ARIMA | R | ARIMA model without or with external features |
ETS | R | Exponential Smoothing algorithm with additive errors |
Prophet | R | Automated forecasting procedure based on an additive model with non-linear trends |
The repository also comes with AzureML-themed notebooks and best practices recipes to accelerate the development of scalable, production-grade forecasting solutions on Azure. In particular, we have the following examples for forecasting with Azure AutoML as well as tuning and deploying a forecasting model on Azure.
Method | Language | Description |
---|---|---|
Azure AutoML | Python | AzureML service that automates model development process and identifies the best machine learning pipeline |
HyperDrive | Python | AzureML service for tuning hyperparameters of machine learning models in parallel on cloud |
AzureML Web Service | Python | AzureML service for deploying a model as a web service on Azure Container Instances |
Getting Started in Python
To quickly get started with the repository on your local machine, use the following commands.
-
Install Anaconda with Python >= 3.6. Miniconda is a quick way to get started.
-
Clone the repository
git clone https://github.com/microsoft/forecasting cd forecasting/
-
Run setup scripts to create conda environment. Please execute one of the following commands from the root of Forecasting repo based on your operating system.
- Linux
./tools/environment_setup.sh
- Windows
tools\environment_setup.bat
Note that for Windows you need to run the batch script from Anaconda Prompt. The script creates a conda environment
forecasting_env
and installs the forecasting utility libraryfclib
. -
Start the Jupyter notebook server
jupyter notebook
-
Run the LightGBM single-round notebook under the
00_quick_start
folder. Make sure that the selected Jupyter kernel isforecasting_env
.
If you have any issues with the above setup, or want to find more detailed instructions on how to set up your environment and run examples provided in the repository, on local or a remote machine, please navigate to the Setup Guide.
Getting Started in R
We assume you already have R installed on your machine. If not, simply follow the instructions on CRAN to download and install R.
The recommended editor is RStudio, which supports interactive editing and previewing of R notebooks. However, you can use any editor or IDE that supports RMarkdown. In particular, Visual Studio Code with the R extension can be used to edit and render the notebook files. The rendered .nb.html
files can be viewed in any modern web browser.
The examples use the Tidyverts family of packages, which is a modern framework for time series analysis that builds on the widely-used Tidyverse family. The Tidyverts framework is still under active development, so it's recommended that you update your packages regularly to get the latest bug fixes and features.
Target Audience
Our target audience for this repository includes data scientists and machine learning engineers with varying levels of knowledge in forecasting as our content is source-only and targets custom machine learning modelling. The utilities and examples provided are intended to be solution accelerators for real-world forecasting problems.
Contributing
We hope that the open source community would contribute to the content and bring in the latest SOTA algorithm. This project welcomes contributions and suggestions. Before contributing, please see our Contributing Guide.
Reference
The following is a list of related repositories that you may find helpful.
Deep Learning for Time Series Forecasting | A collection of examples for using deep neural networks for time series forecasting with Keras. |
Microsoft AI Github | Find other Best Practice projects, and Azure AI designed patterns in our central repository. |
Build Status
Build | Branch | Status |
---|---|---|
Linux CPU | master | |
Linux CPU | staging |