This commit is contained in:
PatrickBue 2020-06-26 18:58:36 +00:00 коммит произвёл GitHub
Родитель d570167fb9
Коммит c30e66be2c
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
3 изменённых файлов: 74 добавлений и 93 удалений

Просмотреть файл

@ -7,16 +7,18 @@ Maintainers (sorted alphabetically)
Maintainers are actively supporting the project and have made substantial contributions to the repository.<br>
They have admin access to the repo and provide support reviewing issues and pull requests.
* **[Anupam Sharma](https://github.com/AnupamMicrosoft)**
* PM support
* **[Patrick Buehler](https://github.com/PatrickBue)**
* Tech lead
* **[JS Tan](https://github.com/jiata)**
* Main contributor
* **[Young Park](https://github.com/youngpark/)**
* Main contributor
* **[Jun Ki Min](https://github.com/loomlike)**
* Main contributor
* **[Anupam Sharma](https://github.com/AnupamMicrosoft)**
* PM support
* **[Miguel González-Fierro](https://github.com/miguelgfierro)**
* DevOps
* **[Patrick Buehler](https://github.com/PatrickBue)**
* Main contributor
* **[Richin Jain](https://github.com/jainr)**
* DevOps
* **[Simon Zhao](https://github.com/simonzhaoms)**

Просмотреть файл

@ -1,7 +1,8 @@
# Contribution Guidelines
Contribution are welcome! Here's a few things to know:
We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion. If you plan to contribute new features, utility functions or extensions, please first open an issue and discuss the feature with us.
Here are a few more things to know:
- [Microsoft Contributor License Agreement](#microsoft-contributor-license-agreement)
- [Steps to Contributing](#steps-to-contributing)
- [Working with Notebooks](#working-with-notebooks)
@ -52,17 +53,7 @@ Note: We use the staging branch to land all new features, so please remember to
## Working with Notebooks
It's challenging to do code review for notebooks in GitHub. [reviewnb](https://www.reviewnb.com/) makes it easy to review notebooks in GitHub but only works with public repository. Since we are still in private mode, [jupytext](https://github.com/mwouts/jupytext) is another option that provides conversion of ipython notebooks to multiple formats and also work with pre-commit. However, it falls short of adding the converted files automatically as part of the git commit. An [issue](https://github.com/mwouts/jupytext/issues/200) has been opened with jupytext for this. In the interim, a more reliable way is to manually convert the notebooks to python script using [nbconvert](https://github.com/jupyter/nbconvert) and manually add and commit them to your branch along with the notebook. nbconvert comes pre-installed as part of jupyter installation, run the following command to convert a notebook to python script and save it in python folder under image_classification folder.
```
$ jupyter nbconvert --output-dir=./image_classification/python --to python ./image_classification/notebooks/mnist.ipynb
```
As you check these converted files in, we don't want to enforce black formatting and flake8 linting on them since they also contain markdown and metadata that does not play well with these tools and after all that's not the main goal for converting these notebooks to python script, main goal is to make diffing easier. You can commit the python script generated from nbconvert by using following option with *git commit* command
```
SKIP=black,flake8 git commit -m "commit message"
```
**Note:** We only want to skip black and flake8 hooks for the nbconvert py files, for everything else these hooks should not be skipped.
When you pull updates from remote there might be merge conflicts at times, use [nbdime](https://nbdime.readthedocs.io/en/latest/) to fix them.
When you pull updates from remote there might be merge conflicts with jupyter notebooks. The tool [nbdime](https://nbdime.readthedocs.io/en/latest/) can help fix such problems.
* To install nbdime
```
pip install ndime

140
SETUP.md
Просмотреть файл

@ -1,30 +1,76 @@
# Setup Guide
This document describes how to setup all the dependencies to run the notebooks
in this repository.
Many computer visions scenarios are extremely computationally heavy. Training a
model often requires a machine that has a GPU, and would otherwise be too slow.
We recommend using the GPU-enabled [Azure Data Science Virtual Machine (DSVM)](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/) since it comes prepared with a lot of the prerequisites needed to efficiently do computer vision.
To scale up or to operationalize your models, we recommend setting up [Azure
ML](https://docs.microsoft.com/en-us/azure/machine-learning/). Our notebooks
provide instructions on how to use it.
This document describes how to setup all the dependencies, and optionally create a virtual machine,
to run the notebooks in this repository.
## Table of Contents
1. [Compute Environment](#compute-environments)
1. [System Requirements](#system-requirements)
1. [Installation](#installation)
1. [System Requirements](#system-requirements)
1. [Compute Environment](#compute-environments)
1. [Tunneling](#tunneling)
## Installation
To install the repository and its dependencies follow these simple steps:
1. (optional) Install Anaconda with Python >= 3.6. [Miniconda](https://conda.io/miniconda.html). This step can be skipped if working on a Data Science Virtual Machine (see the compute environment section).
1. Clone the repository
```
git clone https://github.com/Microsoft/computervision-recipes
```
1. Install the conda environment, you'll find the `environment.yml` file in the root directory. To build the conda environment:
```
cd computervision-recipes
conda env create -f environment.yml
```
1. Activate the conda environment and register it with Jupyter:
```
conda activate cv
python -m ipykernel install --user --name cv --display-name "Python (cv)"
```
1. Start the Jupyter notebook server
```
jupyter notebook
```
1. At this point, you should be able to run the notebooks within the various [scenarios](scenarios) folders.
__pip install__
As an alternative to the steps above, and if you only want to install the 'utils_cv' library (without creating a new conda environment), this can be done using pip install. Note that this does not download the notebooks.
```bash
pip install git+https://github.com/microsoft/ComputerVision.git@master#egg=utils_cv
```
## System Requirement
__Requirements__
* A machine running Linux (suggested) >= 16.04 LTS or Windows
* [Miniconda](https://docs.conda.io/en/latest/miniconda.html) or Anaconda with Python version >= 3.6.
* This is pre-installed on Azure DSVM such that one can run the following steps directly.
* It is recommended to update conda to the latest version: `conda update -n base -c defaults conda`
Note that PyTorch runs slower on Windows than on Linux. This is a known [issue](https://github.com/pytorch/pytorch/issues/12831) which affects model training and is due to parallelized data loading. For compute-heavy training tasks (e.g. Object Detection) and on standard GPUs this difference is typically below 10%. For image classification however, with fast GPU (e.g. V100) and potentially large images, training on Windows can be multiple times slower than on Linux.
__Dependencies__
Make sure you have CUDA Toolkit version 9.0 or above installed on your machine. You can run the command below in your terminal to check.
```
nvcc --version
```
If you don't have CUDA Toolkit or don't have the right version, please download it from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit)
## Compute Environments
Most computer vision scenarios require a GPU, especially if you're training a
custom model. We recommend using a virtual machine to run the notebooks on.
Specifically, we'll want one with a powerful GPU. The NVIDIA's Tesla V100 is a
good choice that can be found in most Azure regions.
Many computer visions scenarios are extremely computationally heavy. Training a model often requires a machine that has a strong GPU, and would otherwise be too slow.
The easiest way to get started is to use the [Azure Data Science Virtual Machine (DSVM)](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/). This VM will come installed with all the system requirements that are needed to run the notebooks in this repository. If you choose this option, you can skip the [System Requirements](#system-requirements) step in this guide as those requirements come pre-installed on the DSVM.
@ -47,71 +93,13 @@ resources.
__Virtual Machine Builder__
One easy way to create your DSVM is to use the [VM Builder](contrib/vm_builder) tool located inside of the 'contrib' folder in the root directory of the repo. Simply run `python contrib/vm_builder/vm_builder.py` at the root level of the repo and this tool will preconfigure your virtual machine with the appropriate settings for working with this repository.
One easy way to create your DSVM is to use the [VM Builder](contrib/vm_builder) tool located inside of the 'contrib' folder in the root directory of the repo. Note that this tool only runs on Linux and Mac, and is not well maintained and might stop working. Simply run `python contrib/vm_builder/vm_builder.py` at the root level of the repo and this tool will preconfigure your virtual machine with the appropriate settings for working with this repository.
> NOTE: the VM builder only works on Linux and Mac.
## System Requirement
__Requirements__
* A machine running Linux >= 16.04 LTS or Windows
* Miniconda or Anaconda with Python version >= 3.6.
* This is pre-installed on Azure DSVM such that one can run the following steps directly. To setup on your local machine, [Miniconda](https://docs.conda.io/en/latest/miniconda.html) is a quick way to get started.
* It is recommended to update conda to the latest version: `conda update -n base -c defaults conda`
> NOTE: For Image Classification, Windows is up to 10x slower in training than Linux. You can set `num_workers=0`, but even still it will be up to 2x slower.
> NOTE: For Object Detection, Windows is about 20% slower in training but about same speed for inference.
__Dependencies__
Make sure you have CUDA Toolkit version 9.0 or above installed on your machine. You can run the command below in your terminal to check.
```
nvcc --version
```
If you don't have CUDA Toolkit or don't have the right version, please download it from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit)
## Installation
To install the repo and its dependencies perform the following steps:
1. Install Anaconda with Python >= 3.6. [Miniconda](https://conda.io/miniconda.html). This step can be skipped if working on a Data Science Virtual Machine.
1. Clone the repository
```
git clone https://github.com/Microsoft/ComputerVision
```
1. Install the conda environment, you'll find the `environment.yml` file in the root directory. To build the conda environment:
```
conda env create -f environment.yml
```
1. Activate the conda environment and register it with Jupyter:
```
conda activate cv
python -m ipykernel install --user --name cv --display-name "Python (cv)"
```
1. Start the Jupyter notebook server
```
jupyter notebook
```
1. At this point, you should be able to run the notebooks within the various [scenarios](scenarios) folders.
__pip install__
As an alternative to the steps above, and if you only want to install
the 'utils_cv' library (without creating a new conda environment),
this can be done using pip install:
```bash
pip install git+https://github.com/microsoft/ComputerVision.git@master#egg=utils_cv
```
> NOTE: if you install this repo using this method, you will not have the notebooks loaded by default.
## Tunneling
If your compute environment is on a VM in the cloud, you can open a tunnel from your VM to your local machine using the following command:
If your compute environment is on a Linux VM in the cloud, you can open a tunnel from your VM to your local machine using the following command:
```
$ssh -L local_port:remote_address:remote_port <username>@<server-ip>
```