added initial data prep notebook
This commit is contained in:
Родитель
28ed8856d7
Коммит
8a64d18408
|
@ -0,0 +1,689 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<i>Copyright (c) Microsoft Corporation.</i>\n",
|
||||
"\n",
|
||||
"<i>Licensed under the MIT License.</i>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Data Preparation for Retail Sales Forecasting\n",
|
||||
"\n",
|
||||
"This notebook introduces how to split the Orange Juice dataset into training sets and test sets for training and evaluating different retail sales forecasting methods.\n",
|
||||
"\n",
|
||||
"We use backtesting a method that tests a predictive model on historical data to evaluate the forecasting methods. Other than standard [K-fold cross validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) which randomly splits data into K folds, we split the data so that any of the time stamps in the training set is no later than any of the time stamps in the test set.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Global Settings and Imports"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%load_ext autoreload\n",
|
||||
"%autoreload 2\n",
|
||||
"%matplotlib inline"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"System version: 3.6.7 | packaged by conda-forge | (default, Nov 6 2019, 16:19:42) \n",
|
||||
"[GCC 7.3.0]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"# import math\n",
|
||||
"# import datetime\n",
|
||||
"import warnings\n",
|
||||
"\n",
|
||||
"# import numpy as np\n",
|
||||
"# import pandas as pd\n",
|
||||
"# import tqdm as tqdm\n",
|
||||
"# import scrapbook as sb\n",
|
||||
"# import matplotlib.pyplot as plt\n",
|
||||
"\n",
|
||||
"import forecasting_lib.common.forecast_settings as fs\n",
|
||||
"from forecasting_lib.common.utils import git_repo_path\n",
|
||||
"from forecasting_lib.dataset.ojdata import download_ojdata, split_train_test\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"warnings.filterwarnings(\"ignore\")\n",
|
||||
"\n",
|
||||
"print(\"System version: {}\".format(sys.version))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Use False if you've already downloaded and split the data\n",
|
||||
"DOWNLOAD_DATA = True\n",
|
||||
"\n",
|
||||
"# Data directory\n",
|
||||
"DATA_DIR = os.path.join(git_repo_path(), \"ojdata\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data Preparation\n",
|
||||
"\n",
|
||||
"We need to download the Orange Juice data and split it into training and test sets. By default, the following cell will download and spit the data. If you've already done so, you may skip this part by switching `DOWNLOAD_SPLIT_DATA` to `False`.\n",
|
||||
"\n",
|
||||
"We store the training data and test data using dataframes. The training data includes `train_df` and `aux_df` with `train_df` containing the historical sales up to week 135 (the time we make forecasts) and `aux_df` containing price/promotion information up until week 138. Here we assume that future price and promotion information up to a certain number of weeks ahead is predetermined and known. The test data is stored in `test_df` which contains the sales of each product in week 137 and 138. Assuming the current week is week 135, our goal is to forecast the sales in week 137 and 138 using the training data. There is a one-week gap between the current week and the first target week of forecasting as we want to leave time for planning inventory in practice. The setting of the forecast problem is defined in `forecasting_lib.common.forecast_settings`. We can change this setting (e.g., modify the horizon of the forecast or the range of the historical data) by updating this setting."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Data already exists at the specified location.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"if DOWNLOAD_DATA:\n",
|
||||
" download_ojdata(DATA_DIR)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data_generator = split_train_test(DATA_DIR, fs)\n",
|
||||
"[train_df, test_df, aux_df] = next(data_generator)\n",
|
||||
"train_df.reset_index(inplace=True)\n",
|
||||
"test_df.reset_index(inplace=True)\n",
|
||||
"aux_df.reset_index(inplace=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>store</th>\n",
|
||||
" <th>brand</th>\n",
|
||||
" <th>week</th>\n",
|
||||
" <th>logmove</th>\n",
|
||||
" <th>constant</th>\n",
|
||||
" <th>price1</th>\n",
|
||||
" <th>price2</th>\n",
|
||||
" <th>price3</th>\n",
|
||||
" <th>price4</th>\n",
|
||||
" <th>price5</th>\n",
|
||||
" <th>price6</th>\n",
|
||||
" <th>price7</th>\n",
|
||||
" <th>price8</th>\n",
|
||||
" <th>price9</th>\n",
|
||||
" <th>price10</th>\n",
|
||||
" <th>price11</th>\n",
|
||||
" <th>deal</th>\n",
|
||||
" <th>feat</th>\n",
|
||||
" <th>profit</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>40</td>\n",
|
||||
" <td>9.018695</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.060469</td>\n",
|
||||
" <td>0.060497</td>\n",
|
||||
" <td>0.042031</td>\n",
|
||||
" <td>0.029531</td>\n",
|
||||
" <td>0.049531</td>\n",
|
||||
" <td>0.053021</td>\n",
|
||||
" <td>0.038906</td>\n",
|
||||
" <td>0.041406</td>\n",
|
||||
" <td>0.028906</td>\n",
|
||||
" <td>0.024844</td>\n",
|
||||
" <td>0.038984</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>37.992326</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>46</td>\n",
|
||||
" <td>8.723231</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.060469</td>\n",
|
||||
" <td>0.060312</td>\n",
|
||||
" <td>0.045156</td>\n",
|
||||
" <td>0.046719</td>\n",
|
||||
" <td>0.049531</td>\n",
|
||||
" <td>0.047813</td>\n",
|
||||
" <td>0.045781</td>\n",
|
||||
" <td>0.027969</td>\n",
|
||||
" <td>0.042969</td>\n",
|
||||
" <td>0.042031</td>\n",
|
||||
" <td>0.038984</td>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>30.126667</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>47</td>\n",
|
||||
" <td>8.253228</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.060469</td>\n",
|
||||
" <td>0.060312</td>\n",
|
||||
" <td>0.045156</td>\n",
|
||||
" <td>0.046719</td>\n",
|
||||
" <td>0.037344</td>\n",
|
||||
" <td>0.053021</td>\n",
|
||||
" <td>0.045781</td>\n",
|
||||
" <td>0.041406</td>\n",
|
||||
" <td>0.048125</td>\n",
|
||||
" <td>0.032656</td>\n",
|
||||
" <td>0.038984</td>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>30.000000</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>48</td>\n",
|
||||
" <td>8.987197</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.060469</td>\n",
|
||||
" <td>0.060312</td>\n",
|
||||
" <td>0.049844</td>\n",
|
||||
" <td>0.037344</td>\n",
|
||||
" <td>0.049531</td>\n",
|
||||
" <td>0.053021</td>\n",
|
||||
" <td>0.045781</td>\n",
|
||||
" <td>0.041406</td>\n",
|
||||
" <td>0.042344</td>\n",
|
||||
" <td>0.032656</td>\n",
|
||||
" <td>0.038984</td>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>29.950000</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>50</td>\n",
|
||||
" <td>9.093357</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.060469</td>\n",
|
||||
" <td>0.060312</td>\n",
|
||||
" <td>0.043594</td>\n",
|
||||
" <td>0.031094</td>\n",
|
||||
" <td>0.049531</td>\n",
|
||||
" <td>0.053021</td>\n",
|
||||
" <td>0.046648</td>\n",
|
||||
" <td>0.041406</td>\n",
|
||||
" <td>0.042344</td>\n",
|
||||
" <td>0.032656</td>\n",
|
||||
" <td>0.038203</td>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>29.920000</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>...</th>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" <td>...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>84178</th>\n",
|
||||
" <td>137</td>\n",
|
||||
" <td>11</td>\n",
|
||||
" <td>131</td>\n",
|
||||
" <td>9.631154</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.027969</td>\n",
|
||||
" <td>0.051979</td>\n",
|
||||
" <td>0.049080</td>\n",
|
||||
" <td>0.039820</td>\n",
|
||||
" <td>0.031094</td>\n",
|
||||
" <td>0.048395</td>\n",
|
||||
" <td>0.037500</td>\n",
|
||||
" <td>0.038906</td>\n",
|
||||
" <td>0.023281</td>\n",
|
||||
" <td>0.022187</td>\n",
|
||||
" <td>0.025703</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>17.170000</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>84179</th>\n",
|
||||
" <td>137</td>\n",
|
||||
" <td>11</td>\n",
|
||||
" <td>132</td>\n",
|
||||
" <td>9.704061</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.030504</td>\n",
|
||||
" <td>0.051979</td>\n",
|
||||
" <td>0.043594</td>\n",
|
||||
" <td>0.033927</td>\n",
|
||||
" <td>0.033167</td>\n",
|
||||
" <td>0.045729</td>\n",
|
||||
" <td>0.031094</td>\n",
|
||||
" <td>0.038906</td>\n",
|
||||
" <td>0.025313</td>\n",
|
||||
" <td>0.024844</td>\n",
|
||||
" <td>0.026328</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>1.0</td>\n",
|
||||
" <td>18.630000</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>84180</th>\n",
|
||||
" <td>137</td>\n",
|
||||
" <td>11</td>\n",
|
||||
" <td>133</td>\n",
|
||||
" <td>8.995165</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.043056</td>\n",
|
||||
" <td>0.051979</td>\n",
|
||||
" <td>0.045542</td>\n",
|
||||
" <td>0.031094</td>\n",
|
||||
" <td>0.037205</td>\n",
|
||||
" <td>0.046579</td>\n",
|
||||
" <td>0.033470</td>\n",
|
||||
" <td>0.037969</td>\n",
|
||||
" <td>0.020156</td>\n",
|
||||
" <td>0.025625</td>\n",
|
||||
" <td>0.029609</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>25.350000</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>84181</th>\n",
|
||||
" <td>137</td>\n",
|
||||
" <td>11</td>\n",
|
||||
" <td>134</td>\n",
|
||||
" <td>8.912473</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.039062</td>\n",
|
||||
" <td>0.049301</td>\n",
|
||||
" <td>0.049588</td>\n",
|
||||
" <td>0.032300</td>\n",
|
||||
" <td>0.031094</td>\n",
|
||||
" <td>0.050937</td>\n",
|
||||
" <td>0.042031</td>\n",
|
||||
" <td>0.035781</td>\n",
|
||||
" <td>0.022031</td>\n",
|
||||
" <td>0.031094</td>\n",
|
||||
" <td>0.029609</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.0</td>\n",
|
||||
" <td>25.320000</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>84182</th>\n",
|
||||
" <td>137</td>\n",
|
||||
" <td>11</td>\n",
|
||||
" <td>135</td>\n",
|
||||
" <td>9.901886</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0.040473</td>\n",
|
||||
" <td>0.045729</td>\n",
|
||||
" <td>0.046957</td>\n",
|
||||
" <td>0.045223</td>\n",
|
||||
" <td>0.033493</td>\n",
|
||||
" <td>0.050937</td>\n",
|
||||
" <td>0.033941</td>\n",
|
||||
" <td>0.035781</td>\n",
|
||||
" <td>0.026406</td>\n",
|
||||
" <td>0.022969</td>\n",
|
||||
" <td>0.023359</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>1.0</td>\n",
|
||||
" <td>5.350000</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"<p>84183 rows × 19 columns</p>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" store brand week logmove constant price1 price2 price3 \\\n",
|
||||
"0 2 1 40 9.018695 1 0.060469 0.060497 0.042031 \n",
|
||||
"1 2 1 46 8.723231 1 0.060469 0.060312 0.045156 \n",
|
||||
"2 2 1 47 8.253228 1 0.060469 0.060312 0.045156 \n",
|
||||
"3 2 1 48 8.987197 1 0.060469 0.060312 0.049844 \n",
|
||||
"4 2 1 50 9.093357 1 0.060469 0.060312 0.043594 \n",
|
||||
"... ... ... ... ... ... ... ... ... \n",
|
||||
"84178 137 11 131 9.631154 1 0.027969 0.051979 0.049080 \n",
|
||||
"84179 137 11 132 9.704061 1 0.030504 0.051979 0.043594 \n",
|
||||
"84180 137 11 133 8.995165 1 0.043056 0.051979 0.045542 \n",
|
||||
"84181 137 11 134 8.912473 1 0.039062 0.049301 0.049588 \n",
|
||||
"84182 137 11 135 9.901886 1 0.040473 0.045729 0.046957 \n",
|
||||
"\n",
|
||||
" price4 price5 price6 price7 price8 price9 price10 \\\n",
|
||||
"0 0.029531 0.049531 0.053021 0.038906 0.041406 0.028906 0.024844 \n",
|
||||
"1 0.046719 0.049531 0.047813 0.045781 0.027969 0.042969 0.042031 \n",
|
||||
"2 0.046719 0.037344 0.053021 0.045781 0.041406 0.048125 0.032656 \n",
|
||||
"3 0.037344 0.049531 0.053021 0.045781 0.041406 0.042344 0.032656 \n",
|
||||
"4 0.031094 0.049531 0.053021 0.046648 0.041406 0.042344 0.032656 \n",
|
||||
"... ... ... ... ... ... ... ... \n",
|
||||
"84178 0.039820 0.031094 0.048395 0.037500 0.038906 0.023281 0.022187 \n",
|
||||
"84179 0.033927 0.033167 0.045729 0.031094 0.038906 0.025313 0.024844 \n",
|
||||
"84180 0.031094 0.037205 0.046579 0.033470 0.037969 0.020156 0.025625 \n",
|
||||
"84181 0.032300 0.031094 0.050937 0.042031 0.035781 0.022031 0.031094 \n",
|
||||
"84182 0.045223 0.033493 0.050937 0.033941 0.035781 0.026406 0.022969 \n",
|
||||
"\n",
|
||||
" price11 deal feat profit \n",
|
||||
"0 0.038984 1 0.0 37.992326 \n",
|
||||
"1 0.038984 0 0.0 30.126667 \n",
|
||||
"2 0.038984 0 0.0 30.000000 \n",
|
||||
"3 0.038984 0 0.0 29.950000 \n",
|
||||
"4 0.038203 0 0.0 29.920000 \n",
|
||||
"... ... ... ... ... \n",
|
||||
"84178 0.025703 1 0.0 17.170000 \n",
|
||||
"84179 0.026328 1 1.0 18.630000 \n",
|
||||
"84180 0.029609 1 0.0 25.350000 \n",
|
||||
"84181 0.029609 1 0.0 25.320000 \n",
|
||||
"84182 0.023359 1 1.0 5.350000 \n",
|
||||
"\n",
|
||||
"[84183 rows x 19 columns]"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"train_df"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Split Data for Multi-Round Forecasting\n",
|
||||
"\n",
|
||||
"We can also create training data and test data for multi-round forecasting. In this case, we gradually increase the length of the training data at each round. This allows us to retrain the forecasting model for achieving more accurate forecasts. Using the default settings in `forecasting_lib/common/forecast_settings.py` and update `NUM_ROUNDS` to 12, we can generate the training and test sets as follows\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"| **Round** | **Train period <br> start week** | **Train period <br> end week** | **Test period <br> start week** | **Test period <br> end week** |\n",
|
||||
"| -------- | --------------- | ------------------ | ------------------------- | ----------------------- |\n",
|
||||
"| 1 | 40 | 135 | 137 | 138 |\n",
|
||||
"| 2 | 40 | 137 | 139 | 140 |\n",
|
||||
"| 3 | 40 | 139 | 141 | 142 |\n",
|
||||
"| 4 | 40 | 141 | 143 | 144 |\n",
|
||||
"| 5 | 40 | 143 | 145 | 146 |\n",
|
||||
"| 6 | 40 | 145 | 147 | 148 |\n",
|
||||
"| 7 | 40 | 147 | 149 | 150 |\n",
|
||||
"| 8 | 40 | 149 | 151 | 152 |\n",
|
||||
"| 9 | 40 | 151 | 153 | 154 |\n",
|
||||
"| 10 | 40 | 153 | 155 | 156 |\n",
|
||||
"| 11 | 40 | 155 | 157 | 158 |\n",
|
||||
"| 12 | 40 | 157 | 159 | 160 |\n",
|
||||
"\n",
|
||||
"The gap of one week between training period and test period allows store managers to prepare the stock based on the forecasted demand. Besides, we assume that the information about the price, deal, and advertisement up until the forecast period end week is available at each round."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Training data size: (84183, 18)\n",
|
||||
"Testing data size: (1826, 18)\n",
|
||||
"Auxiliary data size: (86911, 15)\n",
|
||||
"Minimum training week number: 40\n",
|
||||
"Maximum training week number: 135\n",
|
||||
"Minimum testing week number: 137\n",
|
||||
"Maximum testing week number: 138\n",
|
||||
"Minimum auxiliary week number: 40\n",
|
||||
"Maximum auxiliary week number: 138\n",
|
||||
"\n",
|
||||
"Training data size: (85998, 18)\n",
|
||||
"Testing data size: (1793, 18)\n",
|
||||
"Auxiliary data size: (88704, 15)\n",
|
||||
"Minimum training week number: 40\n",
|
||||
"Maximum training week number: 137\n",
|
||||
"Minimum testing week number: 139\n",
|
||||
"Maximum testing week number: 140\n",
|
||||
"Minimum auxiliary week number: 40\n",
|
||||
"Maximum auxiliary week number: 140\n",
|
||||
"\n",
|
||||
"Training data size: (87802, 18)\n",
|
||||
"Testing data size: (1771, 18)\n",
|
||||
"Auxiliary data size: (90475, 15)\n",
|
||||
"Minimum training week number: 40\n",
|
||||
"Maximum training week number: 139\n",
|
||||
"Minimum testing week number: 141\n",
|
||||
"Maximum testing week number: 142\n",
|
||||
"Minimum auxiliary week number: 40\n",
|
||||
"Maximum auxiliary week number: 142\n",
|
||||
"\n",
|
||||
"Training data size: (89617, 18)\n",
|
||||
"Testing data size: (1749, 18)\n",
|
||||
"Auxiliary data size: (92224, 15)\n",
|
||||
"Minimum training week number: 40\n",
|
||||
"Maximum training week number: 141\n",
|
||||
"Minimum testing week number: 143\n",
|
||||
"Maximum testing week number: 144\n",
|
||||
"Minimum auxiliary week number: 40\n",
|
||||
"Maximum auxiliary week number: 144\n",
|
||||
"\n",
|
||||
"Training data size: (91333, 18)\n",
|
||||
"Testing data size: (1727, 18)\n",
|
||||
"Auxiliary data size: (93951, 15)\n",
|
||||
"Minimum training week number: 40\n",
|
||||
"Maximum training week number: 143\n",
|
||||
"Minimum testing week number: 145\n",
|
||||
"Maximum testing week number: 146\n",
|
||||
"Minimum auxiliary week number: 40\n",
|
||||
"Maximum auxiliary week number: 146\n",
|
||||
"\n",
|
||||
"Training data size: (93071, 18)\n",
|
||||
"Testing data size: (1749, 18)\n",
|
||||
"Auxiliary data size: (95700, 15)\n",
|
||||
"Minimum training week number: 40\n",
|
||||
"Maximum training week number: 145\n",
|
||||
"Minimum testing week number: 147\n",
|
||||
"Maximum testing week number: 148\n",
|
||||
"Minimum auxiliary week number: 40\n",
|
||||
"Maximum auxiliary week number: 148\n",
|
||||
"\n",
|
||||
"Training data size: (94842, 18)\n",
|
||||
"Testing data size: (1771, 18)\n",
|
||||
"Auxiliary data size: (97471, 15)\n",
|
||||
"Minimum training week number: 40\n",
|
||||
"Maximum training week number: 147\n",
|
||||
"Minimum testing week number: 149\n",
|
||||
"Maximum testing week number: 150\n",
|
||||
"Minimum auxiliary week number: 40\n",
|
||||
"Maximum auxiliary week number: 150\n",
|
||||
"\n",
|
||||
"Training data size: (96591, 18)\n",
|
||||
"Testing data size: (1738, 18)\n",
|
||||
"Auxiliary data size: (99209, 15)\n",
|
||||
"Minimum training week number: 40\n",
|
||||
"Maximum training week number: 149\n",
|
||||
"Minimum testing week number: 151\n",
|
||||
"Maximum testing week number: 152\n",
|
||||
"Minimum auxiliary week number: 40\n",
|
||||
"Maximum auxiliary week number: 152\n",
|
||||
"\n",
|
||||
"Training data size: (98340, 18)\n",
|
||||
"Testing data size: (1705, 18)\n",
|
||||
"Auxiliary data size: (100914, 15)\n",
|
||||
"Minimum training week number: 40\n",
|
||||
"Maximum training week number: 151\n",
|
||||
"Minimum testing week number: 153\n",
|
||||
"Maximum testing week number: 154\n",
|
||||
"Minimum auxiliary week number: 40\n",
|
||||
"Maximum auxiliary week number: 154\n",
|
||||
"\n",
|
||||
"Training data size: (100056, 18)\n",
|
||||
"Testing data size: (1705, 18)\n",
|
||||
"Auxiliary data size: (102619, 15)\n",
|
||||
"Minimum training week number: 40\n",
|
||||
"Maximum training week number: 153\n",
|
||||
"Minimum testing week number: 155\n",
|
||||
"Maximum testing week number: 156\n",
|
||||
"Minimum auxiliary week number: 40\n",
|
||||
"Maximum auxiliary week number: 156\n",
|
||||
"\n",
|
||||
"Training data size: (101772, 18)\n",
|
||||
"Testing data size: (1749, 18)\n",
|
||||
"Auxiliary data size: (104368, 15)\n",
|
||||
"Minimum training week number: 40\n",
|
||||
"Maximum training week number: 155\n",
|
||||
"Minimum testing week number: 157\n",
|
||||
"Maximum testing week number: 158\n",
|
||||
"Minimum auxiliary week number: 40\n",
|
||||
"Maximum auxiliary week number: 158\n",
|
||||
"\n",
|
||||
"Training data size: (103488, 18)\n",
|
||||
"Testing data size: (1771, 18)\n",
|
||||
"Auxiliary data size: (106139, 15)\n",
|
||||
"Minimum training week number: 40\n",
|
||||
"Maximum training week number: 157\n",
|
||||
"Minimum testing week number: 159\n",
|
||||
"Maximum testing week number: 160\n",
|
||||
"Minimum auxiliary week number: 40\n",
|
||||
"Maximum auxiliary week number: 160\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"fs.NUM_ROUNDS = 12\n",
|
||||
"for train_df, test_df, aux_df in split_train_test(DATA_DIR, fs):\n",
|
||||
" print(\"Training data size: {}\".format(train_df.shape))\n",
|
||||
" print(\"Testing data size: {}\".format(test_df.shape))\n",
|
||||
" print(\"Auxiliary data size: {}\".format(aux_df.shape))\n",
|
||||
" print(\"Minimum training week number: {}\".format(min(train_df[\"week\"])))\n",
|
||||
" print(\"Maximum training week number: {}\".format(max(train_df[\"week\"])))\n",
|
||||
" print(\"Minimum testing week number: {}\".format(min(test_df[\"week\"])))\n",
|
||||
" print(\"Maximum testing week number: {}\".format(max(test_df[\"week\"])))\n",
|
||||
" print(\"Minimum auxiliary week number: {}\".format(min(aux_df[\"week\"])))\n",
|
||||
" print(\"Maximum auxiliary week number: {}\".format(max(aux_df[\"week\"])))\n",
|
||||
" print(\"\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Additional Reading\n",
|
||||
"\n",
|
||||
"\\[1\\] Christoph GBergmeir, Rob J. Hyndman, and Bonsoo Koo. 2018. A Note on the Validity of Cross-Validation for Evaluating Autoregressive Time Series Prediction. Computational Statistics & Data Analysis. 120, pp. 70-83.<br>\n",
|
||||
"\\[2\\] How To Backtest Machine Learning Models for Time Series Forecasting: https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/Parameters.rst <br>\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6.7 64-bit ('forecast': conda)",
|
||||
"language": "python",
|
||||
"name": "python36764bitforecastconda6547c53842644e9994cb8960c1d7107f"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.7-final"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
Загрузка…
Ссылка в новой задаче