* modified conftests to add arima

* added tests

* modified notebooks with parameters


Former-commit-id: e6d47ee770
This commit is contained in:
vapaunic 2020-03-24 15:31:36 +00:00 коммит произвёл GitHub
Родитель 2866b95a25
Коммит 2657c1c7bb
28 изменённых файлов: 1379 добавлений и 1443 удалений

2
.lintr
Просмотреть файл

@ -15,4 +15,4 @@ linters: with_defaults(
single_quotes_linter = NULL,
pipe_continuation_linter = NULL,
cyclocomp_linter = NULL
)
)

Просмотреть файл

@ -1,8 +1,8 @@
# Forecasting examples
This folder contains Python and R examples for building forecasting solutions presented in Python Jupyter notebooks and R Markdown files, respectively. The examples are organized according to forecasting scenarios in different use cases with each subdirectory under `examples/` named after the specific use case.
This folder contains Python and R examples for building forecasting solutions presented in Python Jupyter notebooks and R Markdown files, respectively. The examples are organized according to forecasting scenarios in different use cases with each subdirectory under `examples/` named after the specific use case.
At the moment, the repository contains a single retail sales forecasting scenario utilizing [Dominick's OrangeJuice data set](https://www.chicagobooth.edu/research/kilts/datasets/dominicks). The name of the directory is `grocery_sales`.
At the moment, the repository contains a single retail sales forecasting scenario utilizing [Dominick's OrangeJuice data set](https://www.chicagobooth.edu/research/kilts/datasets/dominicks). The name of the directory is `grocery_sales`.
## Summary

Просмотреть файл

@ -4,7 +4,7 @@ output: html_notebook
---
_Copyright (c) Microsoft Corporation._<br/>
_Licensed under the MIT License._
_Licensed under the MIT License._
In this notebook, we generate the datasets that will be used for model training and validating.

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -4,7 +4,7 @@ output: html_notebook
---
_Copyright (c) Microsoft Corporation._<br/>
_Licensed under the MIT License._
_Licensed under the MIT License._
```{r, echo=FALSE, results="hide", message=FALSE}
library(tidyr)

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -4,7 +4,7 @@ output: html_notebook
---
_Copyright (c) Microsoft Corporation._<br/>
_Licensed under the MIT License._
_Licensed under the MIT License._
```{r, echo=FALSE, results="hide", message=FALSE}
library(tidyr)

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -4,7 +4,7 @@ output: html_notebook
---
_Copyright (c) Microsoft Corporation._<br/>
_Licensed under the MIT License._
_Licensed under the MIT License._
```{r, echo=FALSE, results="hide", message=FALSE}
library(tidyr)

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -1,6 +1,6 @@
# Forecasting examples in R: orange juice retail sales
The Rmarkdown notebooks in this directory are as follows. Each notebook also has a corresponding HTML file, which is the rendered output from running the code.
The Rmarkdown notebooks in this directory are as follows. Each notebook also has a corresponding HTML file, which is the rendered output from running the code.
- [`01_dataprep.Rmd`](01_dataprep.Rmd) creates the training and test datasets
- [`02_basic_models.Rmd`](02_basic_models.Rmd) fits a range of simple time series models to the data, including ARIMA and ETS.

Просмотреть файл

@ -3,4 +3,4 @@ HORIZON: 2
GAP: 2
FIRST_WEEK: 40
LAST_WEEK: 156
START_DATE: "1989-09-14"
START_DATE: "1989-09-14"

Просмотреть файл

@ -1,6 +1,6 @@
# Forecasting examples
This folder contains Python and R examples for building forecasting solutions on the Orange Juice dataset which is part of the [Dominick's dataset](https://www.chicagobooth.edu/research/kilts/datasets/dominicks). The examples are presented in Python Jupyter notebooks and R Markdown files, respectively.
This folder contains Python and R examples for building forecasting solutions on the Orange Juice dataset which is part of the [Dominick's dataset](https://www.chicagobooth.edu/research/kilts/datasets/dominicks). The examples are presented in Python Jupyter notebooks and R Markdown files, respectively.
## Orange Juice Dataset
@ -19,8 +19,8 @@ Note that the week number starts from 40 in this dataset, while the full Dominic
The following summarizes each directory of the forecasting examples.
| Directory | Content | Description |
|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [python](./python) | [00_quick_start/](./python/00_quick_start) <br>[01_prepare_data/](./python/01_prepare_data) <br> [02_model/](./python/02_model) <br> [03_model_tune_deploy/](./python/03_model_tune_deploy/) | <ul> <li> Quick start examples for single-round training </li> <li> Data exploration and preparation notebooks </li> <li> Multi-round training examples </li> <li> Model tuning and deployment example </li> </ul> |
| [R](./R) | [01_dataprep.Rmd](R/01_dataprep.Rmd) <br> [02_basic_models.Rmd](R/02_basic_models.Rmd) <br> [02a_reg_models.Rmd](R/02a_reg_models.Rmd) <br> [02b_prophet_models.Rmd](R/02b_prophet_models.Rmd) | <ul> <li>Data preparation</li> <li>Basic time series models</li> <li>ARIMA-regression models</li> <li>Prophet models</li> </ul> |
| Directory | Content | Description |
| --- | --- | --- |
| [python](./python)| [00_quick_start/](./python/00_quick_start) <br>[01_prepare_data/](./python/01_prepare_data) <br> [02_model/](./python/02_model) <br> [03_model_tune_deploy/](./python/03_model_tune_deploy/) | <ul> <li> Quick start examples for single-round training </li> <li> Data exploration and preparation notebooks </li> <li> Multi-round training examples </li> <li> Model tuning and deployment example </li> </ul> |
| [R](./R) | [01_dataprep.Rmd](R/01_dataprep.Rmd) <br> [02_basic_models.Rmd](R/02_basic_models.Rmd) <br> [02a_reg_models.Rmd](R/02a_reg_models.Rmd) <br> [02b_prophet_models.Rmd](R/02b_prophet_models.Rmd) | <ul> <li>Data preparation</li> <li>Basic time series models</li> <li>ARIMA-regression models</li> <li>Prophet models</li> </ul> |

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -6,7 +6,7 @@
"source": [
"<i>Copyright (c) Microsoft Corporation.</i>\n",
"\n",
"<i>Licensed under the MIT License.</i>"
"<i>Licensed under the MIT License.</i> "
]
},
{

Просмотреть файл

@ -91,7 +91,11 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"metadata": {
"tags": [
"parameters"
]
},
"outputs": [],
"source": [
"# Use False if you've already downloaded and split the data\n",
@ -114,7 +118,11 @@
" \"start_q\": 0,\n",
" \"max_p\": 5,\n",
" \"max_q\": 5,\n",
"}"
"}\n",
"\n",
"\n",
"# Run notebook on a subset of stores (to reduce the run time)\n",
"STORE_SUBSET = True"
]
},
{
@ -222,7 +230,9 @@
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": [
"def process_test_df(test_df):\n",
@ -251,12 +261,12 @@
"source": [
"Now let's run model training across all the stores and brands, and across all rounds. We will re-run the same code to automatically search for the best parameters, simply wrapped in a for loop iterating over stores and brands.\n",
"\n",
"> **NOTE**: Since we are building a model for each time series sequentially (900+ time series for each store and brand), it would take about 1 hour to run the following cell over all stores. To speed up the execution, we model only a subset of ten stores in each round (exacution time ~8 minutes). To change this behavior, and run ARIMA modeling over *all stores and brands*, switch the boolean indicator `subset_stores` to `False`."
"> **NOTE**: Since we are building a model for each time series sequentially (900+ time series for each store and brand), it would take about 1 hour to run the following cell over all stores. To speed up the execution, we model only a subset of ten stores in each round (exacution time ~8 minutes). To change this behavior, and run ARIMA modeling over *all stores and brands*, switch the boolean indicator `STORE_SUBSET` to `False` under the **Parameters** section on top."
]
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 6,
"metadata": {},
"outputs": [
{
@ -271,7 +281,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"---------- Round 1 ----------\n",
"-------- Round 1 --------\n",
"Training ARIMA model ...\n"
]
},
@ -280,14 +290,14 @@
"output_type": "stream",
"text": [
"\r",
" 20%|██ | 1/5 [01:32<06:09, 92.41s/it]"
" 20%|██ | 1/5 [01:14<04:56, 74.04s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"---------- Round 2 ----------\n",
"-------- Round 2 --------\n",
"Training ARIMA model ...\n"
]
},
@ -296,14 +306,14 @@
"output_type": "stream",
"text": [
"\r",
" 40%|████ | 2/5 [03:04<04:36, 92.18s/it]"
" 40%|████ | 2/5 [02:24<03:38, 72.98s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"---------- Round 3 ----------\n",
"-------- Round 3 --------\n",
"Training ARIMA model ...\n"
]
},
@ -312,14 +322,14 @@
"output_type": "stream",
"text": [
"\r",
" 60%|██████ | 3/5 [04:39<03:06, 93.24s/it]"
" 60%|██████ | 3/5 [04:27<02:55, 87.85s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"---------- Round 4 ----------\n",
"-------- Round 4 --------\n",
"Training ARIMA model ...\n"
]
},
@ -328,14 +338,14 @@
"output_type": "stream",
"text": [
"\r",
" 80%|████████ | 4/5 [06:20<01:35, 95.47s/it]"
" 80%|████████ | 4/5 [06:39<01:41, 101.20s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"---------- Round 5 ----------\n",
"-------- Round 5 --------\n",
"Training ARIMA model ...\n"
]
},
@ -343,15 +353,15 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 5/5 [08:05<00:00, 97.15s/it]"
"100%|██████████| 5/5 [08:56<00:00, 107.30s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 23min 38s, sys: 1min 9s, total: 24min 48s\n",
"Wall time: 8min 5s\n"
"CPU times: user 18min 19s, sys: 1min 6s, total: 19min 25s\n",
"Wall time: 8min 56s\n"
]
},
{
@ -365,8 +375,6 @@
"source": [
"%%time\n",
"\n",
"# CHANGE to False to model across all stores\n",
"subset_stores = True\n",
"\n",
"# Create an empty df to store predictions\n",
"result_df = pd.DataFrame(None, columns=[\"predictions\", \"store\", \"brand\", \"week\", \"actuals\", \"round\"])\n",
@ -386,7 +394,7 @@
" store_list = train_filled[\"store\"].unique()\n",
" brand_list = train_filled[\"brand\"].unique()\n",
"\n",
" if subset_stores:\n",
" if STORE_SUBSET:\n",
" store_list = store_list[0:10]\n",
"\n",
" for store, brand in itertools.product(store_list, brand_list):\n",
@ -442,7 +450,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 7,
"metadata": {},
"outputs": [
{
@ -452,8 +460,8 @@
"MAPE values for each forecasting round:\n",
"round\n",
"1 57.72\n",
"2 77.08\n",
"3 63.12\n",
"2 77.25\n",
"3 63.17\n",
"4 74.93\n",
"5 73.70\n",
"dtype: float64\n"
@ -462,7 +470,7 @@
{
"data": {
"application/scrapbook.scrap.json+json": {
"data": 69.22142436904007,
"data": 69.26658812449644,
"encoder": "json",
"name": "MAPE",
"version": 1
@ -481,7 +489,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Overall MAPE is 69.22 %\n"
"Overall MAPE is 69.27 %\n"
]
}
],
@ -513,7 +521,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 8,
"metadata": {},
"outputs": [
{
@ -565,6 +573,7 @@
}
],
"metadata": {
"celltoolbar": "Tags",
"kernelspec": {
"display_name": "forecasting_env",
"language": "python",

Просмотреть файл

@ -6,7 +6,7 @@
"source": [
"<i>Copyright (c) Microsoft Corporation.</i>\n",
"\n",
"<i>Licensed under the MIT License.</i>"
"<i>Licensed under the MIT License.</i> "
]
},
{

Просмотреть файл

@ -6,7 +6,7 @@
"source": [
"<i>Copyright (c) Microsoft Corporation.</i>\n",
"\n",
"<i>Licensed under the MIT License.</i>"
"<i>Licensed under the MIT License.</i> "
]
},
{

Просмотреть файл

@ -1,5 +1,5 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# Licensed under the MIT License.
"""
Perform cross validation of a LightGBM forecasting model on the training data of the 1st forecast round.

Просмотреть файл

@ -6,7 +6,7 @@
"source": [
"<i>Copyright (c) Microsoft Corporation.</i>\n",
"\n",
"<i>Licensed under the MIT License.</i>"
"<i>Licensed under the MIT License.</i> "
]
},
{

Просмотреть файл

@ -1,16 +1,16 @@
# Forecasting examples in Python
This folder contains Jupyter notebooks with Python examples for building forecasting solutions. To run the notebooks, please ensure your environment is set up with required dependencies by following instructions in the [Setup guide](../../../docs/SETUP.md).
This folder contains Jupyter notebooks with Python examples for building forecasting solutions. To run the notebooks, please ensure your environment is set up with required dependencies by following instructions in the [Setup guide](../../../docs/SETUP.md).
## Summary
The following summarizes each directory of the Python best practice notebooks.
| Directory | Content | Description |
|-------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [00_quick_start](./00_quick_start) | [autoarima_single_round.ipynb](./00_quick_start/autoarima_single_round.ipynb) <br>[azure_automl_single_round.ipynb](./00_quick_start/azure_automl_single_round.ipynb) <br> [lightgbm_single_round.ipynb](./00_quick_start/lightgbm_single_round.ipynb) | Quick start notebooks that demonstrate workflow of developing a forecasting model using one-round training and testing data |
| [01_prepare_data](./01_prepare_data) | [ojdata_exploration.ipynb](./01_prepare_data/ojdata_exploration.ipynb) <br> [ojdata_preparation.ipynb](./01_prepare_data/ojdata_preparation.ipynb) | Data exploration and preparation notebooks |
| [02_model](./02_model) | [dilatedcnn_multi_round.ipynb](./02_model/dilatedcnn_multi_round.ipynb) <br> [lightgbm_multi_round.ipynb](./02_model/lightgbm_multi_round.ipynb) <br> [autoarima_multi_round.ipynb](./02_model/autoarima_multi_round.ipynb) | Deep dive notebooks that perform multi-round training and testing of various classical and deep learning forecast algorithms |
| [03_model_tune_deploy](./03_model_tune_deploy/) | [azure_hyperdrive_lightgbm.ipynb](./03_model_tune_deploy/azure_hyperdrive_lightgbm.ipynb) <br> [aml_scripts/](./03_model_tune_deploy/aml_scripts) | <ul><li> Example notebook for model tuning using Azure Machine Learning Service and deploying the best model on Azure </ul></li> <ul><li> Scripts for model training and validation </ul></li> |
| Directory | Content | Description |
| --- | --- | --- |
| [00_quick_start](./00_quick_start)| [autoarima_single_round.ipynb](./00_quick_start/autoarima_single_round.ipynb) <br>[azure_automl_single_round.ipynb](./00_quick_start/azure_automl_single_round.ipynb) <br> [lightgbm_single_round.ipynb](./00_quick_start/lightgbm_single_round.ipynb) | Quick start notebooks that demonstrate workflow of developing a forecasting model using one-round training and testing data|
| [01_prepare_data](./01_prepare_data) | [ojdata_exploration.ipynb](./01_prepare_data/ojdata_exploration.ipynb) <br> [ojdata_preparation.ipynb](./01_prepare_data/ojdata_preparation.ipynb) | Data exploration and preparation notebooks|
| [02_model](./02_model) | [dilatedcnn_multi_round.ipynb](./02_model/dilatedcnn_multi_round.ipynb) <br> [lightgbm_multi_round.ipynb](./02_model/lightgbm_multi_round.ipynb) <br> [autoarima_multi_round.ipynb](./02_model/autoarima_multi_round.ipynb) | Deep dive notebooks that perform multi-round training and testing of various classical and deep learning forecast algorithms|
| [03_model_tune_deploy](./03_model_tune_deploy/) | [azure_hyperdrive_lightgbm.ipynb](./03_model_tune_deploy/azure_hyperdrive_lightgbm.ipynb) <br> [aml_scripts/](./03_model_tune_deploy/aml_scripts) | <ul><li> Example notebook for model tuning using Azure Machine Learning Service and deploying the best model on Azure </ul></li> <ul><li> Scripts for model training and validation </ul></li> |

Просмотреть файл

@ -9,4 +9,4 @@ UseSpacesForTab: Yes
NumSpacesForTab: 4
Encoding: UTF-8
RnwWeave: knitr
RnwWeave: knitr

Просмотреть файл

@ -1,5 +1,5 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# Pull request against these branches will trigger this build
pr:

Просмотреть файл

@ -24,5 +24,7 @@ def notebooks():
"lightgbm_quick_start": os.path.join(quick_start_path, "lightgbm_single_round.ipynb"),
"lightgbm_multi_round": os.path.join(model_path, "lightgbm_multi_round.ipynb"),
"dilatedcnn_multi_round": os.path.join(model_path, "dilatedcnn_multi_round.ipynb"),
"autoarima_quick_start": os.path.join(quick_start_path, "autoarima_single_round.ipynb"),
"autoarima_multi_round": os.path.join(model_path, "autoarima_multi_round.ipynb"),
}
return paths

Просмотреть файл

@ -1,5 +1,5 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# Licensed under the MIT License.
import os
import pytest
@ -21,6 +21,21 @@ def test_lightgbm_quick_start(notebooks):
assert mape == pytest.approx(35.60, abs=ABS_TOL)
@pytest.mark.integration
def test_autoarima_quick_start(notebooks):
notebook_path = notebooks["autoarima_quick_start"]
output_notebook_path = os.path.join(os.path.dirname(notebook_path), "output.ipynb")
pm.execute_notebook(
notebook_path, output_notebook_path, kernel_name="forecast_cpu", parameters=dict(STORE_SUBSET=True),
)
nb = sb.read_notebook(output_notebook_path)
df = nb.scraps.dataframe
assert df.shape[0] == 1
mape = df.loc[df.name == "MAPE"]["data"][0]
print(mape)
assert mape == pytest.approx(75.6, abs=ABS_TOL)
@pytest.mark.integration
def test_lightgbm_multi_round(notebooks):
notebook_path = notebooks["lightgbm_multi_round"]
@ -47,3 +62,17 @@ def test_dilatedcnn_multi_round(notebooks):
assert df.shape[0] == 1
mape = df.loc[df.name == "MAPE"]["data"][0]
assert mape == pytest.approx(37.7, abs=ABS_TOL)
@pytest.mark.integration
def test_autoarima_multi_round(notebooks):
notebook_path = notebooks["autoarima_multi_round"]
output_notebook_path = os.path.join(os.path.dirname(notebook_path), "output.ipynb")
pm.execute_notebook(
notebook_path, output_notebook_path, kernel_name="forecast_cpu", parameters=dict(N_SPLITS=2, STORE_SUBSET=True),
)
nb = sb.read_notebook(output_notebook_path)
df = nb.scraps.dataframe
assert df.shape[0] == 1
mape = df.loc[df.name == "MAPE"]["data"][0]
assert mape == pytest.approx(74.35, abs=ABS_TOL)

Просмотреть файл

@ -1,5 +1,5 @@
REM Copyright (c) Microsoft Corporation.
REM Licensed under the MIT License.
REM Licensed under the MIT License.
REM Please follow instructions in this link
REM https://docs.conda.io/projects/conda/en/latest/user-guide/install/windows.html

Просмотреть файл

@ -1,7 +1,7 @@
#!/usr/bin/python
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# Licensed under the MIT License.
# This script creates yaml files to build conda environments
# For generating a conda file for running only python code:

Просмотреть файл

@ -1,5 +1,5 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# Licensed under the MIT License.
# This file outputs a requirements.txt based on the libraries defined in generate_conda_file.py
from generate_conda_file import (