updates to documentation
This commit is contained in:
Родитель
b9092831fe
Коммит
a6e7d2fc4d
37
README.md
37
README.md
|
@ -1,4 +1,5 @@
|
|||
# Video Anomaly Detection - powered by Azure MLOps
|
||||
# Video Anomaly Detection - with Azure ML and MLOps
|
||||
|
||||
[![Build Status](https://dev.azure.com/aidemos/MLOps/_apis/build/status/Microsoft.MLOps_VideoAnomalyDetection?branchName=master)](https://dev.azure.com/aidemos/MLOps/_build/latest?definitionId=88?branchName=master)
|
||||
|
||||
The automation of detecting anomalous event sequences in videos is a challenging problem, but also has broad applications across industry verticals.
|
||||
|
@ -27,22 +28,23 @@ You will learn:
|
|||
## Skills
|
||||
|
||||
1. Some familiarity with concepts and frameworks for neural networks:
|
||||
- Framework: [Keras](https://keras.io/)
|
||||
- Framework: [Keras](https://keras.io/) and [Tensorflow](https://www.tensorflow.org/)
|
||||
- Concepts: [convolutional](https://keras.io/layers/convolutional/), [recurrent](https://keras.io/layers/recurrent/), and [pooling](https://keras.io/layers/pooling/) layers.
|
||||
2. Knowledge of basic data science and machine learning concepts. [Here](https://www.youtube.com/watch?v=gNV9EqwXCpw) and [here](https://www.youtube.com/watch?v=GBDSBInvz08) you'll find short introductory material.
|
||||
3. Moderate skills in coding with Python and machine learning using Python. A good place to start is [here](https://www.youtube.com/watch?v=-Rf4fZDQ0yw&list=PLjgj6kdf_snaw8QnlhK5f3DzFDFKDU5f4).
|
||||
|
||||
## Software Dependencies
|
||||
|
||||
- Various python modules. We recommend working with a conda environement (see `environment.yml`) - [Documentation](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). You might have to install at least [Miniconda](https://docs.conda.io/en/latest/miniconda.html) first.
|
||||
- VS code [https://code.visualstudio.com/](https://code.visualstudio.com/)
|
||||
- X2Go [https://wiki.x2go.org/doku.php](https://wiki.x2go.org/doku.php)
|
||||
- Various python modules. We recommend working with a conda environement (see `config/environment.yml` and [Documentation](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)). We recommend you begin by installing [Miniconda](https://docs.conda.io/en/latest/miniconda.html).
|
||||
- If you are using a [DSVM](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/):
|
||||
- We recommend VS code [https://code.visualstudio.com/](https://code.visualstudio.com/) with [ssh - remote](https://code.visualstudio.com/docs/remote/ssh) extension.
|
||||
- We recommend X2Go [https://wiki.x2go.org/doku.php](https://wiki.x2go.org/doku.php)
|
||||
|
||||
|
||||
We found that a useful development environment is to have a VM with a GPU and connect to it using X2Go.
|
||||
|
||||
## Hardware Dependencies
|
||||
|
||||
A computer with a GPU, Standard NC6 sufficient, faster learning with NC6_v2/3 or ND6. [compare VM sizes](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu)
|
||||
A computer with a GPU, for example an Azure VM. Compare VM [sizes](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu) and [prices](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/windows/)).
|
||||
|
||||
## Dataset
|
||||
|
||||
|
@ -52,17 +54,18 @@ A computer with a GPU, Standard NC6 sufficient, faster learning with NC6_v2/3 or
|
|||
|
||||
### Getting Started
|
||||
|
||||
1. [Data Preparation](./docs/data_prep_w_pillow.md)
|
||||
2. [Model Development](./docs/model_development.md)
|
||||
3. [Hyperparameter Tuning](./docs/hyperparameter_tuning.md)
|
||||
4. [Anomaly Detection](./docs/anomaly_detection.md)
|
||||
5. [Deployment](./docs/deployment.md)
|
||||
1. [Data Preparation](./docs/data_prep_w_pillow.md) - Download and prepare data for training/testing.
|
||||
1. [Azure ML Configuration](./docs/aml_configuration.md) - Configure your Azure ML workspace.
|
||||
1. [AML Pipelines](./docs/aml_pipelines.md) - Automate data preparation, training, and re-training.
|
||||
1. [Deployment](./docs/deployment.md)
|
||||
1. [MLOps](./docs/mlops.md) - How to quickly scale your solution with the MLOps extension for DevOps.
|
||||
|
||||
### Advanced Topics
|
||||
### Deep-dive
|
||||
|
||||
1. [Transfer learning](./docs/transfer_learning.md) - How to quickly retrain the model on new data.
|
||||
2. [AML Pipelines](./docs/aml_pipelines.md) - Use AML pipelines to scale your solution.
|
||||
3. [MLOps](./docs/mlops.md) - How to quickly scale your solution with the MLOps extension for DevOps.
|
||||
1. [Model Development](./docs/model_development.md) - Understand model architecture and training.
|
||||
1. [Fine Tuning](./docs/fine_tuning.md) - Perform transfer learning with pretrained model onnew data.
|
||||
1. [Hyperparameter tuning](./docs/hyperparameter_tuning.md) - Tune hyperparameters with HyperDrive.
|
||||
1. [Anomaly Detection](./docs/anomaly_detection.md) - Use Model errors for detecting anomalies.
|
||||
|
||||
## References / Resources
|
||||
|
||||
|
@ -76,6 +79,6 @@ A computer with a GPU, Standard NC6 sufficient, faster learning with NC6_v2/3 or
|
|||
year={2016}
|
||||
}
|
||||
```
|
||||
- Original Prednet implentation is on [github.com](https://coxlab.github.io/prednet/). Note, that the original implementation will only work in Python 2, but not in Python 3.
|
||||
- Original Prednet implementation is on [github.com](https://coxlab.github.io/prednet/).
|
||||
|
||||
- Interesting blog post on [Self-Supervised Video Anomaly Detection](https://launchpad.ai/blog/video-anomaly-detection) by [Steve Shimozaki](https://launchpad.ai/blog?author=590f381c3e00bed4273e304b)
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
"""
|
||||
This file is meant to be run as a step in an AML pipline created by
|
||||
pipelines_create.py, but it can also be run independently.
|
||||
pipelines_slave.py, but it can also be run independently.
|
||||
|
||||
It preprocesses video frames (stored as individual image files) so
|
||||
that they can be use for training a prednet architecture
|
||||
|
|
|
@ -1,51 +1,56 @@
|
|||
import sys
|
||||
import json
|
||||
from azureml.core.runconfig import CondaDependencies
|
||||
from azureml.core import Workspace
|
||||
from azureml.core.model import Model
|
||||
from azureml.core.image import ContainerImage, Image
|
||||
|
||||
|
||||
try:
|
||||
with open("aml_config/model.json") as f:
|
||||
config = json.load(f)
|
||||
except:
|
||||
print('No new model to register thus no need to create new scoring image')
|
||||
#raise Exception('No new model to register as production model perform better')
|
||||
sys.exit(0)
|
||||
|
||||
# initialize workspace from config.json
|
||||
ws = Workspace.from_config()
|
||||
|
||||
prednet_model_name = 'prednet_UCSDped1' #config['model_name']
|
||||
prednet_model_version = 2 # config['model_version']
|
||||
logistic_regression_model_name = 'logistic_regression' #config['model_name']
|
||||
logistic_regression_model_version = 1 # config['model_version']
|
||||
prednet_model_name = 'prednet_UCSDped1'
|
||||
prednet_model_version = 2
|
||||
logistic_regression_model_name = 'logistic_regression'
|
||||
logistic_regression_model_version = 1
|
||||
|
||||
|
||||
|
||||
cd = CondaDependencies.create(pip_packages=['keras==2.0.8', 'theano', 'tensorflow==1.8.0', 'matplotlib', 'hickle', 'pandas', 'azureml-sdk', "scikit-learn"])
|
||||
cd = CondaDependencies.create(
|
||||
pip_packages=[
|
||||
'keras',
|
||||
'tensorflow==1.15',
|
||||
'matplotlib',
|
||||
'hickle',
|
||||
'pandas',
|
||||
'azureml-sdk',
|
||||
"scikit-learn"])
|
||||
|
||||
cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')
|
||||
|
||||
prednet_model = Model(ws, name=prednet_model_name, version=prednet_model_version)
|
||||
logistic_regression_model = Model(ws, name=logistic_regression_model_name, version=logistic_regression_model_version)
|
||||
prednet_model = Model(
|
||||
ws,
|
||||
name=prednet_model_name,
|
||||
version=prednet_model_version)
|
||||
|
||||
img_config = ContainerImage.image_configuration(execution_script="score.py",
|
||||
logistic_regression_model = Model(
|
||||
ws,
|
||||
name=logistic_regression_model_name,
|
||||
version=logistic_regression_model_version)
|
||||
|
||||
img_config = ContainerImage.image_configuration(
|
||||
execution_script="score.py",
|
||||
runtime="python",
|
||||
conda_file="myenv.yml",
|
||||
dependencies=['prednet.py', 'keras_utils.py', 'aml_config/model.json'])
|
||||
dependencies=['prednet.py', 'aml_config/model.json'])
|
||||
|
||||
image_name = prednet_model_name.replace("_", "").lower()
|
||||
|
||||
print("Image name:", image_name)
|
||||
|
||||
image = Image.create(name = image_name,
|
||||
models = [prednet_model],
|
||||
image_config = img_config,
|
||||
workspace = ws)
|
||||
image = Image.create(
|
||||
name=image_name,
|
||||
models=[prednet_model],
|
||||
image_config=img_config,
|
||||
workspace=ws)
|
||||
|
||||
image.wait_for_creation(show_output = True)
|
||||
image.wait_for_creation(show_output=True)
|
||||
|
||||
|
||||
if image.creation_state != 'Succeeded':
|
||||
|
@ -58,6 +63,6 @@ image_json = {}
|
|||
image_json['image_name'] = image.name
|
||||
image_json['image_version'] = image.version
|
||||
image_json['image_location'] = image.image_location
|
||||
with open('aml_config/image.json', 'w') as outfile:
|
||||
json.dump(image_json,outfile)
|
||||
|
||||
with open('aml_config/image.json', 'w') as outfile:
|
||||
json.dump(image_json, outfile)
|
||||
|
|
|
@ -1,58 +0,0 @@
|
|||
import os
|
||||
import numpy as np
|
||||
|
||||
from keras import backend as K
|
||||
from keras.legacy.interfaces import generate_legacy_interface, recurrent_args_preprocessor
|
||||
from keras.models import model_from_json
|
||||
|
||||
legacy_prednet_support = generate_legacy_interface(
|
||||
allowed_positional_args=['stack_sizes', 'R_stack_sizes',
|
||||
'A_filt_sizes', 'Ahat_filt_sizes', 'R_filt_sizes'],
|
||||
conversions=[('dim_ordering', 'data_format'),
|
||||
('consume_less', 'implementation')],
|
||||
value_conversions={'dim_ordering': {'tf': 'channels_last',
|
||||
'th': 'channels_first',
|
||||
'default': None},
|
||||
'consume_less': {'cpu': 0,
|
||||
'mem': 1,
|
||||
'gpu': 2}},
|
||||
preprocessor=recurrent_args_preprocessor)
|
||||
|
||||
# Convert old Keras (1.2) json models and weights to Keras 2.0
|
||||
def convert_model_to_keras2(old_json_file, old_weights_file, new_json_file, new_weights_file):
|
||||
from prednet import PredNet
|
||||
# If using tensorflow, it doesn't allow you to load the old weights.
|
||||
if K.backend() != 'theano':
|
||||
os.environ['KERAS_BACKEND'] = backend
|
||||
reload(K)
|
||||
|
||||
f = open(old_json_file, 'r')
|
||||
json_string = f.read()
|
||||
f.close()
|
||||
model = model_from_json(json_string, custom_objects = {'PredNet': PredNet})
|
||||
model.load_weights(old_weights_file)
|
||||
|
||||
weights = model.layers[1].get_weights()
|
||||
if weights[0].shape[0] == model.layers[1].stack_sizes[1]:
|
||||
for i, w in enumerate(weights):
|
||||
if w.ndim == 4:
|
||||
weights[i] = np.transpose(w, (2, 3, 1, 0))
|
||||
model.set_weights(weights)
|
||||
|
||||
model.save_weights(new_weights_file)
|
||||
json_string = model.to_json()
|
||||
with open(new_json_file, "w") as f:
|
||||
f.write(json_string)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
old_dir = './model_data/'
|
||||
new_dir = './model_data_keras2/'
|
||||
if not os.path.exists(new_dir):
|
||||
os.mkdir(new_dir)
|
||||
for w_tag in ['', '-Lall', '-extrapfinetuned']:
|
||||
m_tag = '' if w_tag == '-Lall' else w_tag
|
||||
convert_model_to_keras2(old_dir + 'prednet_kitti_model' + m_tag + '.json',
|
||||
old_dir + 'prednet_kitti_weights' + w_tag + '.hdf5',
|
||||
new_dir + 'prednet_kitti_model' + m_tag + '.json',
|
||||
new_dir + 'prednet_kitti_weights' + w_tag + '.hdf5')
|
|
@ -2,13 +2,25 @@ import numpy as np
|
|||
|
||||
from keras import backend as K
|
||||
from keras import activations
|
||||
from keras.layers import Recurrent
|
||||
from keras.layers import Conv2D, UpSampling2D, MaxPooling2D
|
||||
|
||||
# from keras.layers import
|
||||
from keras.layers import (
|
||||
Recurrent,
|
||||
Conv2D,
|
||||
UpSampling2D,
|
||||
MaxPooling2D,
|
||||
Dense,
|
||||
Subtract,
|
||||
Concatenate,
|
||||
Flatten,
|
||||
)
|
||||
from keras.engine import InputSpec
|
||||
from keras_utils import legacy_prednet_support
|
||||
# from keras_utils import legacy_prednet_support
|
||||
|
||||
|
||||
|
||||
class PredNet(Recurrent):
|
||||
'''PredNet architecture - Lotter 2016.
|
||||
"""PredNet architecture - Lotter 2016.
|
||||
Stacked convolutional LSTM inspired by predictive coding principles.
|
||||
|
||||
# Arguments
|
||||
|
@ -58,36 +70,72 @@ class PredNet(Recurrent):
|
|||
- [Long short-term memory](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf)
|
||||
- [Convolutional LSTM network: a machine learning approach for precipitation nowcasting](http://arxiv.org/abs/1506.04214)
|
||||
- [Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects](http://www.nature.com/neuro/journal/v2/n1/pdf/nn0199_79.pdf)
|
||||
'''
|
||||
@legacy_prednet_support
|
||||
def __init__(self, stack_sizes, R_stack_sizes,
|
||||
A_filt_sizes, Ahat_filt_sizes, R_filt_sizes,
|
||||
pixel_max=1., error_activation='relu', A_activation='relu',
|
||||
LSTM_activation='tanh', LSTM_inner_activation='hard_sigmoid',
|
||||
output_mode='error', extrap_start_time=None,
|
||||
data_format=K.image_data_format(), **kwargs):
|
||||
"""
|
||||
|
||||
# @legacy_prednet_support
|
||||
def __init__(
|
||||
self,
|
||||
stack_sizes,
|
||||
R_stack_sizes,
|
||||
A_filt_sizes,
|
||||
Ahat_filt_sizes,
|
||||
R_filt_sizes,
|
||||
pixel_max=1.0,
|
||||
error_activation="relu",
|
||||
A_activation="relu",
|
||||
LSTM_activation="tanh",
|
||||
LSTM_inner_activation="hard_sigmoid",
|
||||
output_mode="error",
|
||||
extrap_start_time=None,
|
||||
data_format=K.image_data_format(),
|
||||
**kwargs
|
||||
):
|
||||
self.stack_sizes = stack_sizes
|
||||
self.nb_layers = len(stack_sizes) # the number of layers (w/ each layer containing R, A_hat, A, E)
|
||||
self.nb_layers = len(
|
||||
stack_sizes
|
||||
) # the number of layers (w/ each layer containing R, A_hat, A, E)
|
||||
|
||||
# now let's assert that arguments are consistent with the number of layer
|
||||
assert len(R_stack_sizes) == self.nb_layers, 'len(R_stack_sizes) must equal len(stack_sizes)'
|
||||
assert (
|
||||
len(R_stack_sizes) == self.nb_layers
|
||||
), "len(R_stack_sizes) must equal len(stack_sizes)"
|
||||
self.R_stack_sizes = R_stack_sizes
|
||||
assert len(A_filt_sizes) == (self.nb_layers - 1), 'len(A_filt_sizes) must equal len(stack_sizes) - 1'
|
||||
assert len(A_filt_sizes) == (
|
||||
self.nb_layers - 1
|
||||
), "len(A_filt_sizes) must equal len(stack_sizes) - 1"
|
||||
self.A_filt_sizes = A_filt_sizes
|
||||
assert len(Ahat_filt_sizes) == self.nb_layers, 'len(Ahat_filt_sizes) must equal len(stack_sizes)'
|
||||
assert (
|
||||
len(Ahat_filt_sizes) == self.nb_layers
|
||||
), "len(Ahat_filt_sizes) must equal len(stack_sizes)"
|
||||
self.Ahat_filt_sizes = Ahat_filt_sizes
|
||||
assert len(R_filt_sizes) == (self.nb_layers), 'len(R_filt_sizes) must equal len(stack_sizes)'
|
||||
assert len(R_filt_sizes) == (
|
||||
self.nb_layers
|
||||
), "len(R_filt_sizes) must equal len(stack_sizes)"
|
||||
self.R_filt_sizes = R_filt_sizes
|
||||
|
||||
self.pixel_max = pixel_max
|
||||
self.error_activation = activations.get(error_activation)
|
||||
self.A_activation = activations.get(A_activation)
|
||||
self.LSTM_activation = activations.get(LSTM_activation) # act func for output (c)
|
||||
self.LSTM_inner_activation = activations.get(LSTM_inner_activation) # act func for f,i,o gates
|
||||
self.LSTM_activation = activations.get(
|
||||
LSTM_activation
|
||||
) # act func for output (c)
|
||||
self.LSTM_inner_activation = activations.get(
|
||||
LSTM_inner_activation
|
||||
) # act func for f,i,o gates
|
||||
|
||||
default_output_modes = ['prediction', 'error', 'all']
|
||||
layer_output_modes = [layer + str(n) for n in range(self.nb_layers) for layer in ['R', 'E', 'A', 'Ahat']]
|
||||
assert output_mode in default_output_modes + layer_output_modes, 'Invalid output_mode: ' + str(output_mode)
|
||||
default_output_modes = [
|
||||
"prediction",
|
||||
"error",
|
||||
"all",
|
||||
]
|
||||
layer_output_modes = [
|
||||
layer + str(n)
|
||||
for n in range(self.nb_layers)
|
||||
for layer in ["R", "E", "A", "Ahat"]
|
||||
]
|
||||
assert (
|
||||
output_mode in default_output_modes + layer_output_modes
|
||||
), "Invalid output_mode: " + str(output_mode)
|
||||
self.output_mode = output_mode
|
||||
if self.output_mode in layer_output_modes:
|
||||
self.output_layer_type = self.output_mode[:-1]
|
||||
|
@ -97,11 +145,14 @@ class PredNet(Recurrent):
|
|||
self.output_layer_num = None
|
||||
self.extrap_start_time = extrap_start_time
|
||||
|
||||
assert data_format in {'channels_last', 'channels_first'}, 'data_format must be in {channels_last, channels_first}'
|
||||
assert data_format in {
|
||||
"channels_last",
|
||||
"channels_first",
|
||||
}, "data_format must be in {channels_last, channels_first}"
|
||||
self.data_format = data_format
|
||||
self.channel_axis = -3 if data_format == 'channels_first' else -1
|
||||
self.row_axis = -2 if data_format == 'channels_first' else -3
|
||||
self.column_axis = -1 if data_format == 'channels_first' else -2
|
||||
self.channel_axis = -3 if data_format == "channels_first" else -1
|
||||
self.row_axis = -2 if data_format == "channels_first" else -3
|
||||
self.column_axis = -1 if data_format == "channels_first" else -2
|
||||
super(PredNet, self).__init__(**kwargs)
|
||||
self.input_spec = [InputSpec(ndim=5)]
|
||||
|
||||
|
@ -115,19 +166,29 @@ class PredNet(Recurrent):
|
|||
tupel -- dimensions of output
|
||||
"""
|
||||
|
||||
if self.output_mode == 'prediction':
|
||||
if self.output_mode == "prediction":
|
||||
out_shape = input_shape[2:]
|
||||
elif self.output_mode == 'error':
|
||||
elif self.output_mode == "error":
|
||||
out_shape = (self.nb_layers,)
|
||||
elif self.output_mode == 'all':
|
||||
elif self.output_mode == "all":
|
||||
out_shape = (np.prod(input_shape[2:]) + self.nb_layers,)
|
||||
else:
|
||||
stack_str = 'R_stack_sizes' if self.output_layer_type == 'R' else 'stack_sizes'
|
||||
stack_mult = 2 if self.output_layer_type == 'E' else 1
|
||||
out_stack_size = stack_mult * getattr(self, stack_str)[self.output_layer_num]
|
||||
out_nb_row = input_shape[self.row_axis] / 2**self.output_layer_num
|
||||
out_nb_col = input_shape[self.column_axis] / 2**self.output_layer_num
|
||||
if self.data_format == 'channels_first':
|
||||
stack_str = (
|
||||
"R_stack_sizes"
|
||||
if self.output_layer_type == "R"
|
||||
else "stack_sizes"
|
||||
)
|
||||
stack_mult = 2 if self.output_layer_type == "E" else 1
|
||||
out_stack_size = (
|
||||
stack_mult * getattr(self, stack_str)[self.output_layer_num]
|
||||
)
|
||||
out_nb_row = (
|
||||
input_shape[self.row_axis] / 2 ** self.output_layer_num
|
||||
)
|
||||
out_nb_col = (
|
||||
input_shape[self.column_axis] / 2 ** self.output_layer_num
|
||||
)
|
||||
if self.data_format == "channels_first":
|
||||
out_shape = (out_stack_size, out_nb_row, out_nb_col)
|
||||
else:
|
||||
out_shape = (out_nb_row, out_nb_col, out_stack_size)
|
||||
|
@ -142,102 +203,191 @@ class PredNet(Recurrent):
|
|||
init_nb_row = input_shape[self.row_axis] # number of rows in input
|
||||
init_nb_col = input_shape[self.column_axis] # number of cols in input
|
||||
|
||||
base_initial_state = K.zeros_like(x) # (samples, timesteps) + image_shape
|
||||
non_channel_axis = -1 if self.data_format == 'channels_first' else -2
|
||||
base_initial_state = K.zeros_like(
|
||||
x
|
||||
) # (samples, timesteps) + image_shape
|
||||
non_channel_axis = -1 if self.data_format == "channels_first" else -2
|
||||
for _ in range(2):
|
||||
base_initial_state = K.sum(base_initial_state, axis=non_channel_axis)
|
||||
base_initial_state = K.sum(base_initial_state, axis=1) # (samples, nb_channels)
|
||||
base_initial_state = K.sum(
|
||||
base_initial_state, axis=non_channel_axis
|
||||
)
|
||||
base_initial_state = K.sum(
|
||||
base_initial_state, axis=1
|
||||
) # (samples, nb_channels)
|
||||
|
||||
initial_states = []
|
||||
states_to_pass = ['r', 'c', 'e'] # (r)epresentational layer, (c)ontext (LSTM state), (e) error
|
||||
nlayers_to_pass = {u: self.nb_layers for u in states_to_pass} # how many layers of each state to pass
|
||||
states_to_pass = [
|
||||
"r",
|
||||
"c",
|
||||
"e",
|
||||
] # (r)epresentational layer, (c)ontext (LSTM state), (e) error
|
||||
nlayers_to_pass = {
|
||||
u: self.nb_layers for u in states_to_pass
|
||||
} # how many layers of each state to pass
|
||||
if self.extrap_start_time is not None:
|
||||
states_to_pass.append('ahat') # pass prediction in states so can use as actual for t+1 when extrapolating
|
||||
nlayers_to_pass['ahat'] = 1
|
||||
states_to_pass.append(
|
||||
"ahat"
|
||||
) # pass prediction in states so can use as actual for t+1 when extrapolating
|
||||
nlayers_to_pass["ahat"] = 1
|
||||
for u in states_to_pass:
|
||||
for l in range(nlayers_to_pass[u]):
|
||||
ds_factor = 2 ** l
|
||||
nb_row = init_nb_row // ds_factor
|
||||
nb_col = init_nb_col // ds_factor
|
||||
if u in ['r', 'c']:
|
||||
if u in ["r", "c"]:
|
||||
stack_size = self.R_stack_sizes[l]
|
||||
elif u == 'e':
|
||||
elif u == "e":
|
||||
stack_size = 2 * self.stack_sizes[l]
|
||||
elif u == 'ahat':
|
||||
elif u == "ahat":
|
||||
stack_size = self.stack_sizes[l]
|
||||
output_size = stack_size * nb_row * nb_col # flattened size
|
||||
|
||||
reducer = K.zeros((input_shape[self.channel_axis], output_size)) # (nb_channels, output_size)
|
||||
initial_state = K.dot(base_initial_state, reducer) # (samples, output_size)
|
||||
if self.data_format == 'channels_first':
|
||||
reducer = K.zeros(
|
||||
(input_shape[self.channel_axis], output_size)
|
||||
) # (nb_channels, output_size)
|
||||
initial_state = K.dot(
|
||||
base_initial_state, reducer
|
||||
) # (samples, output_size)
|
||||
if self.data_format == "channels_first":
|
||||
output_shp = (-1, stack_size, nb_row, nb_col)
|
||||
else:
|
||||
output_shp = (-1, nb_row, nb_col, stack_size)
|
||||
initial_state = K.reshape(initial_state, output_shp)
|
||||
initial_states += [initial_state]
|
||||
|
||||
if K._BACKEND == 'theano':
|
||||
from theano import tensor as T
|
||||
# There is a known issue in the Theano scan op when dealing with inputs whose shape is 1 along a dimension.
|
||||
# In our case, this is a problem when training on grayscale images, and the below line fixes it.
|
||||
initial_states = [T.unbroadcast(init_state, 0, 1) for init_state in initial_states]
|
||||
|
||||
if self.extrap_start_time is not None:
|
||||
initial_states += [K.variable(0, int if K.backend() != 'tensorflow' else 'int32')] # the last state will correspond to the current timestep
|
||||
initial_states += [
|
||||
K.variable(0, int if K.backend() != "tensorflow" else "int32")
|
||||
] # the last state will correspond to the current timestep
|
||||
return initial_states
|
||||
|
||||
def build(self, input_shape):
|
||||
self.input_spec = [InputSpec(shape=input_shape)]
|
||||
self.conv_layers = {c: [] for c in ['i', 'f', 'c', 'o', 'a', 'ahat']} # LSTM: (i)nput, (f)orget, (c)ell, (o) output; a, a_hat
|
||||
self.conv_layers = {
|
||||
c: [] for c in ["i", "f", "c", "o", "a", "ahat"]
|
||||
} # LSTM: (i)nput, (f)orget, (c)ell, (o) output; a, a_hat
|
||||
self.e_up_layers = []
|
||||
self.e_down_layers = []
|
||||
self.e_layers = []
|
||||
|
||||
for l in range(self.nb_layers):
|
||||
for c in ['i', 'f', 'c', 'o']:
|
||||
act = self.LSTM_activation if c == 'c' else self.LSTM_inner_activation
|
||||
self.conv_layers[c].append(Conv2D(self.R_stack_sizes[l], self.R_filt_sizes[l], padding='same', activation=act, data_format=self.data_format))
|
||||
for c in ["i", "f", "c", "o"]:
|
||||
act = (
|
||||
self.LSTM_activation
|
||||
if c == "c"
|
||||
else self.LSTM_inner_activation
|
||||
)
|
||||
self.conv_layers[c].append(
|
||||
Conv2D(
|
||||
self.R_stack_sizes[l],
|
||||
self.R_filt_sizes[l],
|
||||
padding="same",
|
||||
activation=act,
|
||||
data_format=self.data_format,
|
||||
)
|
||||
)
|
||||
|
||||
act = 'relu' if l == 0 else self.A_activation
|
||||
self.conv_layers['ahat'].append(Conv2D(self.stack_sizes[l], self.Ahat_filt_sizes[l], padding='same', activation=act, data_format=self.data_format))
|
||||
act = "relu" if l == 0 else self.A_activation
|
||||
self.conv_layers["ahat"].append(
|
||||
Conv2D(
|
||||
self.stack_sizes[l],
|
||||
self.Ahat_filt_sizes[l],
|
||||
padding="same",
|
||||
activation=act,
|
||||
data_format=self.data_format,
|
||||
)
|
||||
)
|
||||
|
||||
if l < self.nb_layers - 1:
|
||||
self.conv_layers['a'].append(Conv2D(self.stack_sizes[l+1], self.A_filt_sizes[l], padding='same', activation=self.A_activation, data_format=self.data_format))
|
||||
self.conv_layers["a"].append(
|
||||
Conv2D(
|
||||
self.stack_sizes[l + 1],
|
||||
self.A_filt_sizes[l],
|
||||
padding="same",
|
||||
activation=self.A_activation,
|
||||
data_format=self.data_format,
|
||||
)
|
||||
)
|
||||
|
||||
self.upsample = UpSampling2D(data_format=self.data_format)
|
||||
self.pool = MaxPooling2D(data_format=self.data_format)
|
||||
|
||||
self.trainable_weights = []
|
||||
nb_row, nb_col = (input_shape[-2], input_shape[-1]) if self.data_format == 'channels_first' else (input_shape[-3], input_shape[-2])
|
||||
nb_row, nb_col = (
|
||||
(input_shape[-2], input_shape[-1])
|
||||
if self.data_format == "channels_first"
|
||||
else (input_shape[-3], input_shape[-2])
|
||||
)
|
||||
for c in sorted(self.conv_layers.keys()):
|
||||
for l in range(len(self.conv_layers[c])):
|
||||
ds_factor = 2 ** l
|
||||
if c == 'ahat':
|
||||
if c == "ahat":
|
||||
nb_channels = self.R_stack_sizes[l]
|
||||
elif c == 'a':
|
||||
elif c == "a":
|
||||
nb_channels = 2 * self.R_stack_sizes[l]
|
||||
else:
|
||||
nb_channels = self.stack_sizes[l] * 2 + self.R_stack_sizes[l]
|
||||
nb_channels = (
|
||||
self.stack_sizes[l] * 2 + self.R_stack_sizes[l]
|
||||
)
|
||||
if l < self.nb_layers - 1:
|
||||
nb_channels += self.R_stack_sizes[l+1]
|
||||
in_shape = (input_shape[0], nb_channels, nb_row // ds_factor, nb_col // ds_factor)
|
||||
if self.data_format == 'channels_last': in_shape = (in_shape[0], in_shape[2], in_shape[3], in_shape[1])
|
||||
with K.name_scope('layer_' + c + '_' + str(l)):
|
||||
self.conv_layers[c][l].build(in_shape)
|
||||
self.trainable_weights += self.conv_layers[c][l].trainable_weights
|
||||
nb_channels += self.R_stack_sizes[l + 1]
|
||||
in_shape = (
|
||||
input_shape[0],
|
||||
nb_channels,
|
||||
nb_row // ds_factor,
|
||||
nb_col // ds_factor,
|
||||
)
|
||||
if self.data_format == "channels_last":
|
||||
in_shape = (
|
||||
in_shape[0],
|
||||
in_shape[2],
|
||||
in_shape[3],
|
||||
in_shape[1],
|
||||
)
|
||||
|
||||
self.states = [None] * self.nb_layers*3
|
||||
if c == "ahat":
|
||||
# print("layer: %d, shape: %s" % (l, in_shape))
|
||||
self.e_down_layers.append(Subtract())
|
||||
self.e_up_layers.append(Subtract())
|
||||
self.e_layers.append(Concatenate())
|
||||
with K.name_scope("layer_e_down_" + str(l)):
|
||||
self.e_down_layers[-1].build([in_shape, in_shape])
|
||||
with K.name_scope("layer_e_up_" + str(l)):
|
||||
self.e_up_layers[-1].build([in_shape, in_shape])
|
||||
with K.name_scope("layer_e_" + str(l)):
|
||||
self.e_layers[-1].build([in_shape, in_shape])
|
||||
with K.name_scope("layer_" + c + "_" + str(l)):
|
||||
self.conv_layers[c][l].build(in_shape)
|
||||
self.trainable_weights += self.conv_layers[c][
|
||||
l
|
||||
].trainable_weights
|
||||
|
||||
self.states = [None] * self.nb_layers * 3
|
||||
|
||||
if self.extrap_start_time is not None:
|
||||
self.t_extrap = K.variable(self.extrap_start_time, int if K.backend() != 'tensorflow' else 'int32')
|
||||
self.t_extrap = K.variable(
|
||||
self.extrap_start_time,
|
||||
int if K.backend() != "tensorflow" else "int32",
|
||||
)
|
||||
self.states += [None] * 2 # [previous frame prediction, timestep]
|
||||
|
||||
def step(self, a, states):
|
||||
# get the states of states_to_pass from t-1 (previous time step)
|
||||
r_tm1 = states[:self.nb_layers] # nb_layers times representational layer
|
||||
c_tm1 = states[self.nb_layers:2*self.nb_layers] # nb_layers times LSTM state
|
||||
e_tm1 = states[2*self.nb_layers:3*self.nb_layers] # nb_layers times error
|
||||
r_tm1 = states[
|
||||
: self.nb_layers
|
||||
] # nb_layers times representational layer
|
||||
c_tm1 = states[
|
||||
self.nb_layers : 2 * self.nb_layers
|
||||
] # nb_layers times LSTM state
|
||||
e_tm1 = states[
|
||||
2 * self.nb_layers : 3 * self.nb_layers
|
||||
] # nb_layers times error
|
||||
|
||||
if self.extrap_start_time is not None:
|
||||
t = states[-1]
|
||||
a = K.switch(t >= self.t_extrap, states[-2], a) # if past self.extrap_start_time, the previous prediction will be treated as the actual
|
||||
a = K.switch(
|
||||
t >= self.t_extrap, states[-2], a
|
||||
) # if past self.extrap_start_time, the previous prediction will be treated as the actual
|
||||
|
||||
c = []
|
||||
r = []
|
||||
|
@ -254,76 +404,91 @@ class PredNet(Recurrent):
|
|||
inputs = K.concatenate(inputs, axis=self.channel_axis)
|
||||
|
||||
# update state of i, f, o, c
|
||||
i = self.conv_layers['i'][l].call(inputs)
|
||||
f = self.conv_layers['f'][l].call(inputs)
|
||||
o = self.conv_layers['o'][l].call(inputs)
|
||||
i = self.conv_layers["i"][l].call(inputs)
|
||||
f = self.conv_layers["f"][l].call(inputs)
|
||||
o = self.conv_layers["o"][l].call(inputs)
|
||||
# update LSTM state, first forget, than add input * state of t-1
|
||||
_c = f * c_tm1[l] + i * self.conv_layers['c'][l].call(inputs)
|
||||
_c = f * c_tm1[l] + i * self.conv_layers["c"][l].call(inputs)
|
||||
# update r, output * lstm-state
|
||||
_r = o * self.LSTM_activation(_c)
|
||||
c.insert(0, _c)
|
||||
r.insert(0, _r)
|
||||
|
||||
if l > 0: # create pointer to representational layer from next deepest layer
|
||||
if (
|
||||
l > 0
|
||||
): # create pointer to representational layer from next deepest layer
|
||||
r_up = self.upsample.call(_r)
|
||||
|
||||
# Update feedforward path starting from the bottom
|
||||
for l in range(self.nb_layers):
|
||||
ahat = self.conv_layers['ahat'][l].call(r[l])
|
||||
ahat = self.conv_layers["ahat"][l].call(r[l])
|
||||
if l == 0:
|
||||
ahat = K.minimum(ahat, self.pixel_max)
|
||||
frame_prediction = ahat
|
||||
|
||||
# compute errors
|
||||
e_up = self.error_activation(ahat - a)
|
||||
e_down = self.error_activation(a - ahat)
|
||||
|
||||
e.append(K.concatenate((e_up, e_down), axis=self.channel_axis))
|
||||
e_up = self.error_activation(self.e_up_layers[l].call([ahat, a]))
|
||||
e_down = self.error_activation(
|
||||
self.e_down_layers[l].call([a, ahat])
|
||||
)
|
||||
e.append(self.e_layers[l].call([e_up, e_down]))
|
||||
|
||||
if self.output_layer_num == l:
|
||||
if self.output_layer_type == 'A':
|
||||
if self.output_layer_type == "A":
|
||||
output = a
|
||||
elif self.output_layer_type == 'Ahat':
|
||||
elif self.output_layer_type == "Ahat":
|
||||
output = ahat
|
||||
elif self.output_layer_type == 'R':
|
||||
elif self.output_layer_type == "R":
|
||||
output = r[l]
|
||||
elif self.output_layer_type == 'E':
|
||||
elif self.output_layer_type == "E":
|
||||
output = e[l]
|
||||
|
||||
if l < self.nb_layers - 1:
|
||||
a = self.conv_layers['a'][l].call(e[l])
|
||||
a = self.conv_layers["a"][l].call(e[l])
|
||||
a = self.pool.call(a) # target for next layer
|
||||
|
||||
if self.output_layer_type is None:
|
||||
if self.output_mode == 'prediction':
|
||||
if self.output_mode == "prediction":
|
||||
output = frame_prediction
|
||||
else:
|
||||
for l in range(self.nb_layers):
|
||||
layer_error = K.mean(K.batch_flatten(e[l]), axis=-1, keepdims=True)
|
||||
all_error = layer_error if l == 0 else K.concatenate((all_error, layer_error), axis=-1)
|
||||
if self.output_mode == 'error':
|
||||
layer_error = K.mean(
|
||||
K.batch_flatten(e[l]), axis=-1, keepdims=True
|
||||
)
|
||||
all_error = (
|
||||
layer_error
|
||||
if l == 0
|
||||
else K.concatenate((all_error, layer_error), axis=-1)
|
||||
)
|
||||
if self.output_mode == "error":
|
||||
output = all_error
|
||||
else:
|
||||
output = K.concatenate((K.batch_flatten(frame_prediction), all_error), axis=-1)
|
||||
output = K.concatenate(
|
||||
(K.batch_flatten(frame_prediction), all_error), axis=-1
|
||||
)
|
||||
|
||||
states = r + c + e
|
||||
if self.extrap_start_time is not None:
|
||||
states += [frame_prediction, t + 1]
|
||||
|
||||
print(K.shape(output))
|
||||
return output, states
|
||||
|
||||
def get_config(self):
|
||||
config = {'stack_sizes': self.stack_sizes,
|
||||
'R_stack_sizes': self.R_stack_sizes,
|
||||
'A_filt_sizes': self.A_filt_sizes,
|
||||
'Ahat_filt_sizes': self.Ahat_filt_sizes,
|
||||
'R_filt_sizes': self.R_filt_sizes,
|
||||
'pixel_max': self.pixel_max,
|
||||
'error_activation': self.error_activation.__name__,
|
||||
'A_activation': self.A_activation.__name__,
|
||||
'LSTM_activation': self.LSTM_activation.__name__,
|
||||
'LSTM_inner_activation': self.LSTM_inner_activation.__name__,
|
||||
'data_format': self.data_format,
|
||||
'extrap_start_time': self.extrap_start_time,
|
||||
'output_mode': self.output_mode}
|
||||
config = {
|
||||
"stack_sizes": self.stack_sizes,
|
||||
"R_stack_sizes": self.R_stack_sizes,
|
||||
"A_filt_sizes": self.A_filt_sizes,
|
||||
"Ahat_filt_sizes": self.Ahat_filt_sizes,
|
||||
"R_filt_sizes": self.R_filt_sizes,
|
||||
"pixel_max": self.pixel_max,
|
||||
"error_activation": self.error_activation.__name__,
|
||||
"A_activation": self.A_activation.__name__,
|
||||
"LSTM_activation": self.LSTM_activation.__name__,
|
||||
"LSTM_inner_activation": self.LSTM_inner_activation.__name__,
|
||||
"data_format": self.data_format,
|
||||
"extrap_start_time": self.extrap_start_time,
|
||||
"output_mode": self.output_mode,
|
||||
}
|
||||
base_config = super(PredNet, self).get_config()
|
||||
return dict(list(base_config.items()) + list(config.items()))
|
||||
|
|
|
@ -1,11 +1,10 @@
|
|||
name: prednet_environment
|
||||
dependencies:
|
||||
- python=3.6.2
|
||||
- python=3.6
|
||||
|
||||
- pip:
|
||||
- keras==2.0.8
|
||||
- theano
|
||||
- tensorflow==1.8.0
|
||||
- keras
|
||||
- tensorflow==1.15
|
||||
- matplotlib
|
||||
- hickle
|
||||
- pandas
|
||||
|
|
|
@ -0,0 +1,25 @@
|
|||
# Azure ML Configuration
|
||||
|
||||
This requires the following steps:
|
||||
1. Configure AML workspace
|
||||
1. Create an Azure Service Principal
|
||||
1. Upload the data to the default datastore of your workspace
|
||||
|
||||
|
||||
## Configure AML workspace
|
||||
|
||||
First step is to attach to an AML workspace.
|
||||
|
||||
If you don't have one yet, you can create one using this [documentation](https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/machine-learning/service/setup-create-workspace.md#sdk).
|
||||
|
||||
## Create Azure Service Principal
|
||||
|
||||
This is necessary for non-interactive authentication. Create a service principal and give it *Contributor* access to your workspace (see [documentation](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/manage-azureml-service/authentication-in-azureml/authentication-in-azureml.ipynb)).
|
||||
|
||||
Store the information in a [config.json](https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/machine-learning/service/how-to-configure-environment.md#create-a-workspace-configuration-file) file in the root directory of this repository.
|
||||
|
||||
This repository contains a sample configuration file (`config/config_sample.json`) that shows which information needs to be included.
|
||||
|
||||
## Upload the data to the default datastore of your workspace
|
||||
|
||||
We upload the training data so that it can be mounted as remote drives in the aml compute targets. You can use the method `upload_data` in `utils.py` for that.
|
|
@ -1,54 +1,22 @@
|
|||
# Transfer Learning w/ AML Pipelines
|
||||
|
||||
So far we have looked into training one model, including hyperparameter tuning. Let's figure out how to scale our solution such that whenever a new dataset is uploaded to blob store, a new anomaly detection is automatically created for this dataset.
|
||||
|
||||
To speed things up and to potentially improve the performance of our model, we will use [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning).
|
||||
You can run the training script `train.py` locally to train your model. Let's figure out how to scale our solution such that whenever a new dataset is uploaded to blob store, a new anomaly detection is automatically created for this dataset.
|
||||
|
||||
We create two AML pipelines to scale our solution.
|
||||
|
||||
First, we set up a training pipeline that:
|
||||
- Splits a video into individual frames.
|
||||
- Generates a keras graph for the model.
|
||||
- Sweeps hyperparameters to find the best model.
|
||||
|
||||
To scale our solution, we then define a second pipeline that:
|
||||
- Adds a Datastore monitor that triggers the creation of a new training pipeline with the above steps.
|
||||
- Ensures the new model is better than the currently deployed one.
|
||||
- Registers the best model for deployment as webservice.
|
||||
First, we create a master AML pipeline (`pipeline_master.py`). This pipeline monitors the Azure Blob Storage container for new data. Whenever new data is added, it checks whether this is from a new location for which we don't have a model yet. If that is the case, it creates a new AML pipeline for that dataset, by calling (`pipeline_slave.py`).
|
||||
|
||||
## AML pipeline for training the model
|
||||
|
||||
The AML pipeline for training a model is defined in `pipelines_create.py`.
|
||||
The AML pipeline for training a model is defined in `pipelines_slave.py`.
|
||||
|
||||
It contains the following steps:
|
||||
1. video_decoding - extract individual frames of the video and store them in separate files (e.g. tiff)
|
||||
1. data_prep - scale and crop images so that their size matches the size of the input layer of our model.
|
||||
1. train_w_hyperdrive - train the model using hyperdrive
|
||||
1. register_model - register the model in the AML workspace for later deployment
|
||||
|
||||
### Transfer learning
|
||||
|
||||
Take a look at the definition of the hyperdrive step.
|
||||
|
||||
```
|
||||
ps = RandomParameterSampling(
|
||||
{
|
||||
[...]
|
||||
'--freeze_layers': choice("0, 1, 2", "1, 2, 3", "0, 1", "1, 2", "2, 3", "0", "3"),
|
||||
'--transfer_learning': choice("True", "False")
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
What this does is tell hyperdrive to explore whether transfer_learning benefits training. It also explores which layers to freeze during transfer learning.
|
||||
|
||||
If transfer_learning is performed, the `train.py` script looks for an existing model in the model registry, downloads it, and starts retraining it for the current dataset.
|
||||
|
||||
You may be wondering whether training will really be faster, even if we also have training runs without transfer learning. Those training runs could potentially take very long to converge. Luckily hyperdrive comes with an early termination policy, so that runs that are taking too long and are performing worse than other runs are immediately canceled.
|
||||
|
||||
```
|
||||
policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1, delay_evaluation=20)
|
||||
```
|
||||
1. register_prednet - register the model in the AML workspace for later deployment
|
||||
1. batch_scoring - perform batch scoring on the test data.
|
||||
1. train_classifier - the test data is labeled for whether a video frame contains anomalies. We use train a classifier that uses the errors of the neural network model to predict whether a video frame contains an anomaly.
|
||||
1. register_classifier - register the trained classifier in the AML model registry
|
||||
|
||||
## AML pipeline for dynamic generation of new training pipelines
|
||||
|
||||
|
|
|
@ -16,7 +16,7 @@ The webservice *returns* the mean squared error of the model's predictions of ea
|
|||
|
||||
## Create scoring script
|
||||
|
||||
We have created a scoring script for you: `score.py`. Let's take a look at what it does.
|
||||
We have created a scoring script for you: `deployment/score.py`. Let's take a look at what it does.
|
||||
|
||||
The scoring script is the script that is run when the webservice is deployed and also everytime data is sent to the webservice for processing.
|
||||
|
||||
|
|
|
@ -0,0 +1,5 @@
|
|||
# Fine Tuning
|
||||
|
||||
Checkout the script `train.py`. You will notice several blocks that check whether we want to perform fine tuning. If that is the case, the script will look for a registered model in the AML Model Registry, instead of training from scratch.
|
||||
|
||||
Note the additional input argument `freeze_layers` to the script. This allows us to explore which layers get to learn during fine-tuning.
|
|
@ -1,48 +1,61 @@
|
|||
# Hyperparameter Tuning with HyperDrive
|
||||
# Hyperparameter Tuning with HypderDrive
|
||||
|
||||
> file: `hyperparameter_tuning.py`
|
||||
>
|
||||
Let's see whether we can improve the performance of our model by tuning the hyperparameters.
|
||||
Take a look at the definition of the hyperdrive step in `pipeline_slave.py`.
|
||||
|
||||
This requires the following steps:
|
||||
1. Configure AML workspace
|
||||
2. Upload the data to the default datastore of your workspace
|
||||
3. Define a remote AMLCompute compute target
|
||||
4. Prepare scripts for uploading to compute target
|
||||
5. Define Tensorflow estimator
|
||||
6. Configure HyperDrive run
|
||||
7. Submit job for execution.
|
||||
```
|
||||
est = Estimator(
|
||||
source_directory=script_folder,
|
||||
compute_target=gpu_compute_target,
|
||||
entry_script="train.py",
|
||||
node_count=1,
|
||||
environment_definition=env,
|
||||
)
|
||||
|
||||
ps = BayesianParameterSampling(
|
||||
{
|
||||
"--batch_size": choice(1, 2, 4, 10),
|
||||
"--filter_sizes": choice("3, 3, 3", "4, 4, 4", "5, 5, 5"),
|
||||
"--stack_sizes": choice(
|
||||
"48, 96, 192", "36, 72, 144", "12, 24, 48"
|
||||
),
|
||||
"--learning_rate": uniform(1e-6, 1e-3),
|
||||
"--lr_decay": uniform(1e-9, 1e-2),
|
||||
"--freeze_layers": choice(
|
||||
"0, 1, 2", "1, 2, 3", "0, 1", "1, 2", "2, 3", "0", "3"
|
||||
),
|
||||
"--transfer_learning": choice("True", "False"),
|
||||
}
|
||||
)
|
||||
|
||||
## Configure AML workspace
|
||||
hdc = HyperDriveConfig(
|
||||
estimator=est,
|
||||
hyperparameter_sampling=ps,
|
||||
primary_metric_name="val_loss",
|
||||
primary_metric_goal=PrimaryMetricGoal.MINIMIZE,
|
||||
max_total_runs=1,
|
||||
max_concurrent_runs=1,
|
||||
max_duration_minutes=60 * 6,
|
||||
)
|
||||
|
||||
First step is to attach to an AML workspace.
|
||||
train_prednet = HyperDriveStep(
|
||||
"train_w_hyperdrive",
|
||||
hdc,
|
||||
estimator_entry_script_arguments=[
|
||||
"--preprocessed_data",
|
||||
preprocessed_data,
|
||||
"--remote_execution",
|
||||
"--dataset",
|
||||
dataset,
|
||||
# "--hd_child_cwd",
|
||||
# hd_child_cwd
|
||||
],
|
||||
inputs=[preprocessed_data],
|
||||
outputs=[hd_child_cwd],
|
||||
metrics_output=data_metrics,
|
||||
allow_reuse=True,
|
||||
)
|
||||
```
|
||||
|
||||
If you don't have one yet, you can create one using this [documentation](https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/machine-learning/service/setup-create-workspace.md#sdk).
|
||||
What this does is tell hyperdrive to explore whether transfer_learning benefits training. It also explores which layers to freeze during transfer learning.
|
||||
|
||||
We recommend storing the information in a [config.json](https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/machine-learning/service/how-to-configure-environment.md#create-a-workspace-configuration-file) file in the root directory of this repository.
|
||||
|
||||
|
||||
## Upload the data to the default datastore of your workspace
|
||||
|
||||
We upload the training data so that it can be mounted as remote drives in the aml compute targets.
|
||||
|
||||
## Define a remote AMLCompute compute target
|
||||
|
||||
We are using a `Standard_NC6` virtual machine. It is inexpensive and includes a GPU powerful enough for this purpose.
|
||||
|
||||
## Prepare scripts for uploading to compute target
|
||||
|
||||
The training script and dependencies have to be available to the job running on the compute target.
|
||||
|
||||
## Define Tensorflow estimator
|
||||
|
||||
HyperDrive works best if we use an Estimator specifically defined for tensorflow models.
|
||||
|
||||
## Configure HyperDrive run
|
||||
|
||||
Next, we define which hyperparameters to search over, and which strategy to use for searching. Here, we are using `RandomParameterSampling` and a `BanditPolicy`.
|
||||
|
||||
## Submit job for execution.
|
||||
|
||||
Now everything is good to go.
|
||||
If transfer_learning is performed, the `train.py` script looks for an existing model in the model registry, downloads it, and starts retraining it for the current dataset.
|
||||
|
|
|
@ -55,7 +55,7 @@ if os.path.exists(script_folder):
|
|||
os.makedirs(script_folder)
|
||||
|
||||
shutil.copy(os.path.join(base_dir, "utils.py"), script_folder)
|
||||
shutil.copy(os.path.join(base_dir, "pipelines_create.py"), script_folder)
|
||||
shutil.copy(os.path.join(base_dir, "pipelines_slave.py"), script_folder)
|
||||
shutil.copy(os.path.join(base_dir, "train.py"), script_folder)
|
||||
shutil.copytree(
|
||||
os.path.join(base_dir, "models"),
|
||||
|
@ -144,15 +144,11 @@ env.register(ws)
|
|||
runconfig = RunConfiguration()
|
||||
runconfig.environment = env
|
||||
print("PipelineData object created")
|
||||
# runconfig.environment.docker.enabled = True
|
||||
# runconfig.environment.docker.gpu_support = False
|
||||
# runconfig.environment.docker.base_image = DEFAULT_CPU_IMAGE
|
||||
# runconfig.environment.spark.precache_packages = False
|
||||
|
||||
|
||||
create_pipelines = PythonScriptStep(
|
||||
name="create pipelines",
|
||||
script_name="pipelines_create.py",
|
||||
script_name="pipelines_slave.py",
|
||||
compute_target=cpu_compute_target,
|
||||
arguments=[
|
||||
"--cpu_compute_name",
|
||||
|
|
|
@ -131,7 +131,7 @@ def build_prednet_pipeline(dataset, ws):
|
|||
"--freeze_layers": choice(
|
||||
"0, 1, 2", "1, 2, 3", "0, 1", "1, 2", "2, 3", "0", "3"
|
||||
),
|
||||
"--transfer_learning": choice("True", "False"),
|
||||
"--fine_tuning": choice("True", "False"),
|
||||
}
|
||||
)
|
||||
|
27
train.py
27
train.py
|
@ -48,13 +48,6 @@ parser.add_argument(
|
|||
dest="preprocessed_data",
|
||||
help="data folder mounting point",
|
||||
)
|
||||
# parser.add_argument(
|
||||
# "--hd_child_cwd",
|
||||
# default="./prednet_path",
|
||||
# type=str,
|
||||
# dest="hd_child_cwd",
|
||||
# help="data where model is stored",
|
||||
# )
|
||||
parser.add_argument(
|
||||
"--learning_rate",
|
||||
default=1e-3,
|
||||
|
@ -117,10 +110,10 @@ parser.add_argument(
|
|||
required=False,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--transfer_learning",
|
||||
dest="transfer_learning",
|
||||
"--fine_tuning",
|
||||
dest="fine_tuning",
|
||||
default="True",
|
||||
help="use the benchmark model and perform transfer learning",
|
||||
help="use the benchmark model and perform fine tuning",
|
||||
type=str,
|
||||
required=False,
|
||||
)
|
||||
|
@ -155,8 +148,8 @@ preprocessed_data = os.path.join(
|
|||
args.preprocessed_data,
|
||||
args.dataset)
|
||||
batch_size = args.batch_size
|
||||
transfer_learning = str2bool(args.transfer_learning)
|
||||
if len(args.freeze_layers) > 0 and transfer_learning:
|
||||
fine_tuning = str2bool(args.fine_tuning)
|
||||
if len(args.freeze_layers) > 0 and fine_tuning:
|
||||
freeze_layers = tuple(map(int, args.freeze_layers.split(",")))
|
||||
else:
|
||||
freeze_layers = []
|
||||
|
@ -214,7 +207,7 @@ if remote_execution:
|
|||
run.log("dataset", args.dataset)
|
||||
run.log("batch_size", batch_size)
|
||||
run.log("freeze_layers", args.freeze_layers)
|
||||
run.log("transfer_learning", args.transfer_learning)
|
||||
run.log("fine_tuning", args.fine_tuning)
|
||||
|
||||
# model parameters
|
||||
A_filt_size, Ahat_filt_size, R_filt_size = filter_sizes
|
||||
|
@ -249,7 +242,7 @@ X_test_file = os.path.join(preprocessed_data, "X_test.hkl")
|
|||
y_test_file = os.path.join(preprocessed_data, "y_test.hkl")
|
||||
test_sources = os.path.join(preprocessed_data, "sources_test.hkl")
|
||||
|
||||
if transfer_learning:
|
||||
if fine_tuning:
|
||||
print("Performing transfer learning.")
|
||||
from azureml.core import Workspace
|
||||
from azureml.core.authentication import ServicePrincipalAuthentication
|
||||
|
@ -292,9 +285,9 @@ if transfer_learning:
|
|||
"learning!"
|
||||
)
|
||||
print(e)
|
||||
transfer_learning = False
|
||||
fine_tuning = False
|
||||
|
||||
# if transfer_learning:
|
||||
# if fine_tuning:
|
||||
# # load model from json file
|
||||
# # todo, this is going to the real one
|
||||
# json_file = open(os.path.join(model_root, "model.json"), "r")
|
||||
|
@ -410,7 +403,7 @@ optimizer = Adam(lr=learning_rate, decay=lr_decay)
|
|||
# put it all together
|
||||
model = keras.models.Model(inputs=inputs, outputs=final_errors)
|
||||
model.compile(loss=loss_type, optimizer=optimizer)
|
||||
if transfer_learning:
|
||||
if fine_tuning:
|
||||
model.load_weights(
|
||||
os.path.join(model_root, "outputs", "weights.hdf5"),
|
||||
by_name=True,
|
||||
|
|
Загрузка…
Ссылка в новой задаче