Родитель
997a54b73d
Коммит
468ec60433
45
README.md
45
README.md
|
@ -1,17 +1,54 @@
|
|||
# Computer Vision Best Practices
|
||||
|
||||
This repository will provide examples and best practices for building Computer Vision systems, provided as Jupyter notebooks, and using PyTorch as Deep Learning library. Image classification will be covered first, followed by object detection and image similarity.
|
||||
This repository provides implementations and best practice guidelines for building Computer Vision systems. All examples are given as Jupyter notebooks, and use PyTorch as Deep Learning library.
|
||||
|
||||
[![Build Status](https://dev.azure.com/best-practices/computervision/_apis/build/status/Build-UnitTest?branchName=staging)](https://dev.azure.com/best-practices/computervision/_build/latest?definitionId=2&branchName=staging)
|
||||
|
||||
## Planning etc documents
|
||||
## Overview
|
||||
|
||||
All feature planning is done via projects, milestones, and issues in this Github repository.
|
||||
The goal of this repository is to help speed up development of Computer Vision applications. Rather than implementing custom approaches, the focus is on providing examples and links to existing state-of-the-art libraries. In addition, having worked in this space for many years, we aim to answer common questions, point out often observed pitfalls, and show how to use the cloud for deployment and training.
|
||||
|
||||
Currently, the main investment/priority is around image classification and to a lesser extend image segmentation. We also actively work on providing a basic (but often sufficiently accurate) example on how to do image similarity. Object detection is scheduled to start once image classification is completed. See the projects and milestones in this repository for more details.
|
||||
|
||||
|
||||
## Getting Started
|
||||
|
||||
Instructions to get started are provided in the [image classification README.md](image_classification/README.md) file.
|
||||
Instructions on how to get started, as well as our example notebooks and discussions are provided in the [image classification](image_classification/README.md) subfolder.
|
||||
|
||||
Note that for certain Computer Vision problems, ready-made or easily customizable solutions exist which do not require any custom coding or machine learning expertise. We strongly recommend evaluating if any of these address the problem at hand. Only if that is not the case, or if the accuracy of these solutions is not sufficient, do we recommend the much more time-consuming and difficult (since it requires expert knowledge) path of building custom models.
|
||||
|
||||
These Microsoft services address common Computer Vision tasks:
|
||||
|
||||
- [Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/directory/vision/)
|
||||
Pre-trained REST APIs which can be called to do e.g. image classification, face recognition, OCR, video analytics, and much more. These APIs are easy to use and work out of the box (e.g. no training required), however customization is limited. See the various demos to get a feeling for their functionality, e.g. on this [site](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/).
|
||||
|
||||
|
||||
- [Custom Vision Service](https://azure.microsoft.com/en-us/services/cognitive-services/custom-vision-service/)
|
||||
SaaS service to train and deploy a model as a REST API given a user-provided training set. All steps from image upload, annotation, to model deployment can either be performed using a UI, or alternatively (but not necessary) a Python SDK. Both training image classification and object detection models is supported, with only minimal machine learning knowledge. The Custom Vision Service hence offers more flexibility than using the pre-trained Cognitive Services APIs, but requires the user to bring and annotate their own datasets.
|
||||
|
||||
- [Azure Machine Learning service (AzureML)](https://azure.microsoft.com/en-us/services/machine-learning-service/)
|
||||
Scenario-agnostic machine learning service that helps users accelerate training and deploying machine learning models. While not specific for Computer Vision workloads, one can use the AzureML Python SDK to deploy scalable and reliable web-services using e.g. Kubernetes, or for heavily parallel training on a cloud-based GPU cluster. While AzureML offers significantly more flexibility than the other options above, it also requires significantly more machine learning and programming knowledge.
|
||||
|
||||
|
||||
## Computer Vision Domains
|
||||
|
||||
Most applications in Computer Vision fall into one of these 4 categories:
|
||||
|
||||
- **Image classification**: Given an input image, predict what objects are present. This is typically the easiest CV problem to solve, however requires objects to be reasonably large in the image.
|
||||
|
||||
<img align="center" src="https://cvbp.blob.core.windows.net/public/images/document_images/intro_ic_vis.jpg" height="150" alt="Image classification visualization"/>
|
||||
|
||||
- **Object Detection**: Given an input image, predict what objects are present and where the objects are (using rectangular coordinates). Object detection approaches work even if the object is small. However model training takes longer than image classification, and manually annotating images is more time-consuming.
|
||||
|
||||
<img align="center" src="https://cvbp.blob.core.windows.net/public/images/document_images/intro_od_vis.jpg" height="150" alt="Object detect visualization"/>
|
||||
|
||||
- **Image Similarity** Given an input image, find all similar images in a reference dataset. Here, rather than predicting a label or a rectangle, the task is to sort a reference dataset by their similarity to the query image.
|
||||
|
||||
<img align="center" src="https://cvbp.blob.core.windows.net/public/images/document_images/intro_is_vis.jpg" height="150" alt="Image similarity visualization"/>
|
||||
|
||||
- **Image Segmentation** Given an input image, assign a label to all pixels e.g. background, bottle, hand, sky, etc. In practice, this problem is less common in industry, in big parts due to the segmentation masks required during training.
|
||||
|
||||
<img align="center" src="https://cvbp.blob.core.windows.net/public/images/document_images/intro_iseg_vis.jpg" height="150" alt="Image segmentation visualization"/>
|
||||
|
||||
|
||||
## Contributing
|
||||
|
|
|
@ -0,0 +1,51 @@
|
|||
# Image classification
|
||||
|
||||
## Frequently asked questions
|
||||
|
||||
|
||||
* General
|
||||
* [How does the technology work?](#how-does-the-technology-work)
|
||||
* [Which problems can be solved using image classification, and which ones cannot](#which-problems-can-be-solved-using-image-classification)
|
||||
* Data
|
||||
* [How many images are required to train a model?](#how-many-images-are-required-to-train-a-model)
|
||||
* [How to annotate images?](#how-to-annotate-images)
|
||||
* [How to split into training and test images?](#How-to-split-into-training-and-test-images)
|
||||
* [How to design a good test set?](#how-to-design-a-good-test-set)
|
||||
* [How to speed up training?](#how-to-speed-up-training)
|
||||
* Training
|
||||
* [How to improve accuracy or inference speed?](#how-to-improve-accuracy-or-inference-speed)
|
||||
|
||||
### How does the technology work?
|
||||
State-of-the-art image classification methods such as used in this repository are based on Convolutional Neural Networks (CNN). CNNs are a special group of Deep Learning approaches shown to work well on image data. The key is to use CNNs which were already trained on millions of images (the ImageNet dataset) and to fine-tune these pre-trained CNNs using a potentially much smaller custom dataset. This is the approach also taken in this repository. The web is full of introductions to these conceptions, such as [link](https://towardsdatascience.com/simple-introduction-to-convolutional-neural-networks-cdf8d3077bac).
|
||||
|
||||
|
||||
### Which problems can be solved using image classification?
|
||||
Image classification can be used if the object-of-interest is relatively large in the image, e.g. more than 20% image width/height. If the object is smaller, or if the location of the object is required, then object detection methods should be used instead.
|
||||
|
||||
|
||||
### How many images are required to train a model?
|
||||
This depends heavily on the complexity of the problem. For example, if the object-of-interest looks very different from image to image (viewing angle, lighting condition, etc) then more training images are required for the model to learn the appearance of the object.
|
||||
|
||||
In practice, we have seen good results using 100 images for each class or sometime less. The only way to find out how many images are required, is by training the model using increasing number of images, while observing how the accuracy improves (while keeping the test set fixed). Once accuracy improvements become small, this would indicate that more training images are not required.
|
||||
|
||||
|
||||
### How to annotate images?
|
||||
Consistency is key. For example, occluded objects should either be always annotated, or never. Furthermore, ambiguous images should be removed, eg if it is unclear to a human eye if an image shows a lemon or a tennis ball. Ensuring consistency is difficult especially if multiple people are involved, and hence our recommendation is that only a single person, the one who trains the AI model, annotates all images. This has the added benefit of gaining a better understanding of the images and of the complexity of the classification task.
|
||||
|
||||
Note that the test set should be of high annotation quality, so that accuracy estimates are reliable.
|
||||
|
||||
|
||||
### How to split into training and test images?
|
||||
Often a random split, as is performed in the notebooks, is fine. However, there are exceptions: for example, if the images are extracted from a movie, then having frame *n* in the training set and frame *n+1* in the test set would result in accuracy estimates which are over-inflated since the two images are too similar.
|
||||
|
||||
|
||||
### How to design a good test set?
|
||||
The test set should contain images which resemble what the input to the trained model looks like when deployed. For example, images taken under similar lighting conditions, similar angles, etc. This is to ensure that the accuracy estimate reflects the real performance of the application which uses the trained model.
|
||||
|
||||
|
||||
### How to speed up training?
|
||||
- All images should be stored on an SSD device, since HDD or network access times can dominate the training time due to high latency.
|
||||
- Very high-resolution images (>4 MegaPixels) should be downsized before DNN training since JPEG decoding is expensive and can slow down training by a factor of >10x.
|
||||
|
||||
### How to improve accuracy or inference speed?
|
||||
See the [02_training_accuracy_vs_speed.ipynb](.notebooks/02_training_accuracy_vs_speed.ipynb) notebook for a discussion what parameters are important, and how to select a model which is fast during inference.
|
|
@ -1,10 +1,22 @@
|
|||
# Image classification
|
||||
|
||||
This directory provides examples and best practices for building image classification systems. We recommend to use PyTorch as Deep Learning library due to its ease of use, simple debugging, and popularity in the data science community. For Computer Vision functionality, we rely heavily on [fast.ai](https://github.com/fastai/fastai), one of the most well-known PyTorch data science libraries, which comes with rich feature support and extensive documentation.
|
||||
This directory provides examples and best practices for building image classification systems. Our goal is enable the users to bring their own datasets and train a high-accuracy classifier easily and quickly. To this end, we provide example notebooks with pre-set default parameters shown to work well on a variety of datasets, and extensive documentation of common pitfalls, best practices, etc. In addition, we show how to use the Azure cloud to e.g. deploy models as a webserivce, or to speed up training on large datasets using the power of the cloud.
|
||||
|
||||
Our goal is enable the users to bring their own datasets and train a high-accuracy classifier easily and quickly. To this end, we provide example notebooks with pre-set default parameters shown to work well on a variety of datasets, and extensive documentation of commont pitfalls, best practices, etc. In addition, we show how to use the Azure cloud to e.g. deploy models as a webserivce, or to speed up training on large datasets using the power of the cloud.
|
||||
|
||||
See also fast.ai's [documentation](https://docs.fast.ai/) and most recent [course](https://github.com/fastai/course-v3) for more explanations and code examples.
|
||||
We recommend to use PyTorch as Deep Learning library due to its ease of use, simple debugging, and popularity in the data science community. For Computer Vision functionality, we rely heavily on [fast.ai](https://github.com/fastai/fastai), one of the most well-known PyTorch data science libraries, which comes with rich feature support and extensive documentation. To get a better understanding of the underlying technology, we highly recommend to watch the [2019 fast.ai lecture series](https://course.fast.ai/videos/?lesson=1), and to go through fast.ai's [documentation](https://docs.fast.ai/).
|
||||
|
||||
|
||||
## Notebooks
|
||||
|
||||
We provide several notebooks to show how image classification algorithms can be designed, evaluated and operationalized. Note that the notebooks starting with 0 are meant to be "required", while all other notebooks are optional.
|
||||
|
||||
| Notebook name | Description |
|
||||
| --- | --- |
|
||||
| [00_webcam.ipynb](.notebooks/00_webcam.ipynb)| Quick start notebooks which demonstrate how to load a trained model and run inference using a single image of webcam input.
|
||||
| [01_training_introduction.ipynb](.notebooks/01_training_introduction.ipynb)| Notebook which explains some of the basic concepts around model training and evaluation.|
|
||||
| [02_training_accuracy_vs_speed.ipynb](.notebooks/02_training_accuracy_vs_speed.ipynb)| Notebook to train a model with e.g. high accuracy of fast inference speed. <font color="orange"> Use this to train on your own datasets! </font> |
|
||||
| [11_exploring_hyperparameters.ipynb](.notebooks/11_exploring_hyperparameters.ipynb)| Advanced notebook to find optimal parameters by doing an exhaustive grid search. |
|
||||
| deployment/[01_deployment_on_azure_container_instances.ipynb](.notebooks/11_exploring_hyperparameters.ipynb)| Notebook showing how to deploy a trained model as REST API using Azure Container Instances. |
|
||||
|
||||
## Getting Started
|
||||
|
||||
|
@ -31,6 +43,12 @@ To setup on your local machine:
|
|||
```
|
||||
2. Run the [Webcam Image Classification Notebook](notebooks/00_webcam.ipynb) notebook under the notebooks folder. Make sure to change the kernel to "Python (cvbp)".
|
||||
|
||||
|
||||
## Frequently asked questions
|
||||
|
||||
Answers to Frequently Asked Questions such as "How many images do I need to train a model?" or "How to annotate images?" can be found in the [FAQ.md](FAQ.md) file.
|
||||
|
||||
|
||||
## Coding guidelines
|
||||
|
||||
Variable naming should be consistent, i.e. an image should always be called "im" and not "i", "img", "imag", "image", etc. Since we take a strong dependency on fast.ai, variable naming should follow the standards of fast.ai which are described in this [abbreviation guide](https://docs.fast.ai/dev/abbr.html). The one exception to this guide is that variable names should be as self-explanatory as possible. For example, the meaning of the variable "batch_size" is clear, compared to using "bs" to refer to batch size.
|
||||
|
@ -55,23 +73,3 @@ The main variables and abbreviations are given in the table below:
|
|||
| lines,strings | List of strings
|
||||
| list1D | List of items, not necessarily strings
|
||||
| -s | Multiple of something (plural) should be indicated by appending an "s" to an abbreviation.
|
||||
|
||||
## Notebooks
|
||||
|
||||
We provide several notebooks to show how image classification algorithms can be designed, evaluated and operationalized.
|
||||
|
||||
1. [Webcam](.notebooks/00_webcam.ipynb)
|
||||
|
||||
An introduction to image classification.
|
||||
|
||||
1. [Intro to training image classification models](.notebooks/01_training_introduction.ipynb)
|
||||
|
||||
Introduction to training an Image classification model.
|
||||
|
||||
1. TODO
|
||||
|
||||
## Appendix
|
||||
|
||||
1. [Fastai course v3](https://github.com/fastai/course-v3)
|
||||
|
||||
|
||||
|
|
|
@ -1,19 +1,5 @@
|
|||
(This document is up-to-date as of 3/27/2019)
|
||||
|
||||
# Overview of Azure's Computer Vision Offerings
|
||||
[Microsoft Azure](https://azure.microsoft.com/en-us/) provides a variety of options when it comes to computer vision.
|
||||
The outline below provides an overview of such services, starting with the highest level service where you simply
|
||||
consume an API to the lowest level service where you develop the model and the infrastructure required to deploy it.
|
||||
|
||||
This document covers following topics:
|
||||
|
||||
* [What is Computer Vision](#What-is-Computer-Vision)
|
||||
* [Computer Vision and Machine Learning Services in Azure](#Computer-Vision-and-Machine-Learning-Services-in-Azure)
|
||||
- [Cognitive Services](#Cognitive-Services)
|
||||
- [Custom Vision Service](#Custom-Vision-Service)
|
||||
- [Azure Machine Learning Service](#Azure-Machine-Learning-Service)
|
||||
* [What Should I Use?](#What-Should-I-Use?)
|
||||
|
||||
|
||||
## What is Computer Vision
|
||||
Computer vision is one of the most popular disciplines in industry and academia nowadays that aims to train computers
|
||||
|
@ -28,14 +14,6 @@ Click on the following topics to see more details:
|
|||
<details>
|
||||
<summary><strong>Image Classification</strong></summary>
|
||||
|
||||
A large number of problems in the computer vision domain can be solved using image classification approaches.
|
||||
These include building models which answer questions such as, *"Is an OBJECT present in the image?"*
|
||||
(where OBJECT could for example be "dog", "car", "ship", etc.) as well as more complex questions, like
|
||||
*"What class of eye disease severity is evinced by this patient's retinal scan?"*
|
||||
|
||||
Image classification can be further categorized into **single-label** and **multi-label** classifications
|
||||
depending on whether a target image contains a single object class or multiple objects of different classes.
|
||||
|
||||
|
||||
<img src="https://cvbp.blob.core.windows.net/public/images/document_images/example_single_classification.png" width="600"/>
|
||||
|
||||
|
@ -52,7 +30,7 @@ depending on whether a target image contains a single object class or multiple o
|
|||
<summary><strong>Image Similarity</strong></summary>
|
||||
|
||||
Retail companies want to show customers products which are similar to the ones bought in the past.
|
||||
Or companies with large amounts of data want to organize and search their images effectively.
|
||||
Or companies with large amounts of data want to organize and search their images effectively.
|
||||
Image similarity detection can solve such interesting problems.
|
||||
|
||||
<img src="https://cvbp.blob.core.windows.net/public/images/document_images/example_image_similarity.jpg" width="600"/>
|
||||
|
@ -90,64 +68,3 @@ assigning a label to every pixel in an image such that pixels with the same labe
|
|||
<i>An example of image segmentation</i><br>
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
## Computer Vision and Machine Learning Services in Azure
|
||||
|
||||
#### Cognitive Services
|
||||
[Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/) allow you to consume
|
||||
machine learning hosted services. Within Cognitive Services API, there are several
|
||||
[computer vision services](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/):
|
||||
|
||||
- [Face API](https://azure.microsoft.com/en-us/services/cognitive-services/face/):
|
||||
Face detection, person identification and emotion recognition
|
||||
- [Content Moderator](https://azure.microsoft.com/en-us/services/cognitive-services/content-moderator/):
|
||||
Image, text and video moderation
|
||||
- [Computer Vision](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/):
|
||||
Analyzing images, reading text and handwriting, identifying celebrities, and intelligently generating thumbnails
|
||||
- [Video Indexer](https://azure.microsoft.com/en-us/services/media-services/video-indexer/):
|
||||
Analyzing videos
|
||||
|
||||
Targeting popular and specific use cases, these services can be consumed with easy to use APIs.
|
||||
Users do no have to do any modeling or understand any machine learning concepts. They simply need to pass an image
|
||||
or video to the hosted endpoint, and consume the results that are returned.
|
||||
|
||||
Note, for these Cognitive Services, the models are pretrained and cannot be modified.
|
||||
|
||||
#### Custom Vision Service
|
||||
[Custom Vision Service](https://azure.microsoft.com/en-us/services/cognitive-services/custom-vision-service/)
|
||||
is a SaaS service where you can train your own vision models with minimal machine learning knowledge.
|
||||
Upload labelled training images through the browser application or through their APIs and the Custom Vision Service
|
||||
will help you train and evaluate your model. Once you are satisfied with your model's performance, the model will be
|
||||
ready for consumption as an endpoint.
|
||||
|
||||
Currently, the Custom Vision Service can do image classification (multi-class + multi-label) and object detection scenarios.
|
||||
|
||||
#### Azure Machine Learning Service
|
||||
[Azure Machine Learning service (AzureML)](https://azure.microsoft.com/en-us/services/machine-learning-service/)
|
||||
is a scenario-agnostic machine learning service that will help users accelerate training and deploying
|
||||
machine learning models. Use automated machine learning to identify suitable algorithms and tune hyperparameters faster.
|
||||
Improve productivity and reduce costs with autoscaling compute and DevOps for machine learning.
|
||||
Seamlessly deploy to the cloud and the edge with one click. Access all these capabilities from your favorite
|
||||
Python environment using the latest open-source frameworks, such as PyTorch, TensorFlow, and scikit-learn.
|
||||
|
||||
---
|
||||
|
||||
## What Should I Use?
|
||||
When it comes to doing computer vision on Azure, there are many options and it can be confusing to figure out
|
||||
what services to use.
|
||||
|
||||
One approach is see if the scenario you are solving for is one that is covered by one of the Cognitive Services APIs.
|
||||
If so, you can start by using those APIs and determine if the results are performant enough. If they are not,
|
||||
you may consider customizing the model with the Custom Vision Service, or building your own model using
|
||||
Azure Machine Learning service.
|
||||
|
||||
Another approach is to determine the degree of customizability and fine tuning you want.
|
||||
Cognitive Services APIs provide no flexibility. The Custom Vision Service provides flexibility insofar as being able to
|
||||
choose what kind of training data to use (it is also only limited so solving classification and object detection problems).
|
||||
Azure Machine Learning service provides complete flexibility, letting you set hyperparameters, select model architectures
|
||||
(or build your own), and perform any manipulation needed at the framework (pytorch, tensorflow, cntk, etc) level.
|
||||
|
||||
One consideration is that more customizability also translates to more responsibility.
|
||||
When using Azure Machine Learning service, you get the most flexibility, but you will be responsible for making sure
|
||||
the models are performant and deploying them.
|
||||
|
|
Загрузка…
Ссылка в новой задаче