renamed some files and their references
This commit is contained in:
Родитель
d6bd39ee46
Коммит
7f41144a82
|
@ -73,11 +73,10 @@ The recommended first step is to clone this repository.
|
|||
|
||||
### Getting Started
|
||||
|
||||
1. [Data Preparation](./docs/data_prep_w_pillow.md) - Download and prepare data for training/testing.
|
||||
1. [Data Preparation](./docs/data_preparation.md) - Download and prepare data for training/testing.
|
||||
1. [Azure ML Configuration](./docs/aml_configuration.md) - Configure your Azure ML workspace.
|
||||
1. [AML Pipelines](./docs/aml_pipelines.md) - Automate data preparation, training, and re-training.
|
||||
1. [Deployment](./docs/deployment.md)
|
||||
1. [MLOps](./docs/mlops.md) - How to quickly scale your solution with the MLOps extension for DevOps.
|
||||
1. [Deployment](./docs/deployment.md) - How to deploy your anomaly detection as a webservice on AKS.
|
||||
|
||||
### Deep-dive
|
||||
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Anomaly Detection
|
||||
|
||||
We have prepared our data (`data_prep.py`) and trained our model (`train.py`). Next, we see whether we can used it for detecting anomalies in video sequences.
|
||||
We have prepared our data (`data_preparation.py`) and trained our model (`train.py`). Next, we see whether we can used it for detecting anomalies in video sequences.
|
||||
|
||||
This is done in two steps:
|
||||
1. Apply the trained model to the test dataset, i.e. to videos that contain anomalies.
|
||||
|
@ -12,17 +12,13 @@ The UCSD dataset is a bit unusual in the sense that it actually contains informa
|
|||
|
||||
## Apply the trained model to the test dataset
|
||||
|
||||
Here, we use the script `test.py`. The script begins by loading the data.
|
||||
Here, we use the script `batch_scoring.py`. The script begins by loading the data.
|
||||
|
||||
|
||||
### Configuration
|
||||
|
||||
There are some minimal settings we have to perform here. Most things don't change from how we configured the model and data for training, so we don't have to go over those again.
|
||||
|
||||
However, one critical change is that we trained the model on video sequences of 10 frames. The test data is setup contains video sequences of with 200 frames. We are going to show these in their entirety to the model, so we set the setting `nt` to 200.
|
||||
|
||||
> If you are anticipating that your model will have to work with videos of variable lengths, you could bad shorter sequences with empty frames in the end.
|
||||
|
||||
Another thing to we need to do is to configure what kind of output we want to model to produce. Here, we are setting it to output its prediction for the next frame. That is, the activation in the A_hat layer:
|
||||
|
||||
```
|
||||
|
@ -46,11 +42,9 @@ The last step is to save these metrics to a pickled pandas dataframe.
|
|||
|
||||
## Annotate results with labels from test dataset
|
||||
|
||||
> `annotate_results.py`
|
||||
Next, we load the labels (`y_test`) for the test dataset to annotate our results.
|
||||
|
||||
Next, we load the labels for the test dataset to annotate our results.
|
||||
|
||||
Our results dataframe (from running `test.py`) contains one row per dataframe and various metrics for the frame, giving us information about how well the model predicted each video frame (e.g. mean squared error). We want to add a column that tells us for each frame whether this frame contains an anomaly.
|
||||
Our results dataframe (from running `batch_scoring.py`) contains one row per dataframe and various metrics for the frame, giving us information about how well the model predicted each video frame (e.g. mean squared error). We want to add a column that tells us for each frame whether this frame contains an anomaly.
|
||||
|
||||
## Explore relationship between model metrics and anomalies
|
||||
|
||||
|
|
|
@ -1,67 +0,0 @@
|
|||
# Data Preparation
|
||||
|
||||
You can execute the following script to prepare data locally:
|
||||
|
||||
> file: `data_preparation.py`
|
||||
> runtime: ~1 minute
|
||||
|
||||
Note: As mentioned in the [README](../README.md) file, consider using a Conda Environmnet. This can be done with the simple command: `conda env create -f config/environment.yml`
|
||||
|
||||
## Download the data
|
||||
|
||||
Download the data from the [UCSD website](http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm) and unpack it in the `data` subdirectory of the root folder of your clone of this repository.
|
||||
|
||||
For example, you could run the following in Bash:
|
||||
```
|
||||
cd /tmp
|
||||
wget http://www.svcl.ucsd.edu/projects/anomaly/UCSD_Anomaly_Dataset.tar.gz
|
||||
cd /home/wopauli/MLOps_VideoAnomalyDetection/data
|
||||
tar xzvf /tmp/UCSD_Anomaly_Dataset.tar.gz
|
||||
```
|
||||
|
||||
You can tell whether you have the data in the right location by checking whether the following path exists:
|
||||
|
||||
``MLOps_VideoAnomalyDetection/data/UCSD_Anomaly_Dataset.v1p2/UCSDped1/Train/Train001``
|
||||
|
||||
## Data prep
|
||||
|
||||
The next step for us is to get our data into shape for training the model.
|
||||
|
||||
1. Split the data into sets for training, validation, and testing.
|
||||
2. Load the individual images, then:
|
||||
- Resize the image to match the size of our model.
|
||||
- Insert them to a numpy array which can be used for training (dimensions: n_images, height, width, depth).
|
||||
- Create a second array that contains the folder in which each video frame was stored.
|
||||
3. [Hickle](https://github.com/telegraphic/hickle) the created arrays to a binary [HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) file for faster I/O during training.
|
||||
|
||||
|
||||
## Data Split
|
||||
|
||||
We use the `sklearn.model_selection import train_test_split` to split the *normal* videos randomly into videos for training the model and videos for validation. We continuously perform model validation during training, to see how well the model does with videos that haven't been used during training.
|
||||
|
||||
We also create a dataset for *testing*. Here, we are using the videos which contain anomalies, to see whether our approach allows us to detect anomalies.
|
||||
|
||||
## Resize the images
|
||||
|
||||
We resize the images so that they match the size of the input layer of our model. You may ask, why don't we just change the size of the input layer to match the size of our images. The answer is that it's complicated:
|
||||
|
||||
1. There are constraints on possible dimensions of the input layer. We'll go into more detail on this topic in the step [model_development](./model_development.md)
|
||||
2. You may need to change this depending on the compute hardware you are using (e.g. A Tesla K80 card will not have as much memory as a Tesla P100).
|
||||
|
||||
> This you could add here:
|
||||
> - Crop images, to remove parts you are not interested in.
|
||||
> - Blur images, this can sometimes help with convergence.
|
||||
> - Rotate iamges, which can help with generalization to videos that were recorded in different angles.
|
||||
> - Converting to gray-scale. If you have videos that were recorded in color (RGB), you could convert them to gray-scale. (We are actually converting gray-scale images to RGB format here. This doesn't make much sense here, but this allows us to keep the model architecture such that it will work with color videos as well.)
|
||||
|
||||
## Build numpy arrays to hold video data and file folders
|
||||
|
||||
We then insert the preprocessed video frames into numpy arrays, one array for each dataset spilt. This array has the dimensions n_images * height * width * depth.
|
||||
|
||||
We create a second array that will tell for each video frame the folder that it was stored in. We will use this information to determine which video sequence a video frame belongs to.
|
||||
|
||||
## Save the processed video data
|
||||
|
||||
We [Hickle](https://github.com/telegraphic/hickle) the created arrays to a binary [HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) file.
|
||||
|
||||
Note, that this binary file has the potential to expose you to version skew. That is, you won't be able to load data into Python 3 if it was stored in Python 2.
|
|
@ -25,18 +25,10 @@ Apart from importing the modules needed for processing data and creating the net
|
|||
1. init - which is executed only once, when the webservice is started.
|
||||
2. run - which is executed everytime data is sent to the webservice for processing.
|
||||
|
||||
|
||||
## Create a docker image
|
||||
|
||||
Filename: `deployment/created_docker_image.py`
|
||||
|
||||
We use the above scoring script, define dependencies for a conda environmnet in which to execute the scoring script, and also include all custom scripts needed by the scoring script.
|
||||
|
||||
|
||||
## Deploy the docker image as a webservice
|
||||
|
||||
Filename: `deploy_aci.py`
|
||||
Filename: `deployment/deploy.py`
|
||||
|
||||
## Test the webservice
|
||||
|
||||
Use the script `test_aci.py` to see whether your webservice behaves as expected.
|
||||
Use the script `deployment/test_webservice.py` to see whether your webservice behaves as expected.
|
||||
|
|
Загрузка…
Ссылка в новой задаче