Updates to readmes

2020-06-26 18:58:36 +00:00 · 2020-06-26 18:58:36 +00:00 · c30e66be2c
--- a/AUTHORS.md
+++ b/AUTHORS.md
@ -7,16 +7,18 @@ Maintainers (sorted alphabetically)
 Maintainers are actively supporting the project and have made substantial contributions to the repository.<br>
 They have admin access to the repo and provide support reviewing issues and pull requests.

-* **[Anupam Sharma](https://github.com/AnupamMicrosoft)**
-   * PM support
+* **[Patrick Buehler](https://github.com/PatrickBue)**
+   * Tech lead
 * **[JS Tan](https://github.com/jiata)**
   * Main contributor
+* **[Young Park](https://github.com/youngpark/)**
+   * Main contributor
 * **[Jun Ki Min](https://github.com/loomlike)**
   * Main contributor
+* **[Anupam Sharma](https://github.com/AnupamMicrosoft)**
+   * PM support
 * **[Miguel González-Fierro](https://github.com/miguelgfierro)**
   * DevOps
-* **[Patrick Buehler](https://github.com/PatrickBue)**
-   * Main contributor
 * **[Richin Jain](https://github.com/jainr)**
   * DevOps
 * **[Simon Zhao](https://github.com/simonzhaoms)**
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -1,7 +1,8 @@
 # Contribution Guidelines

-Contribution are welcome! Here's a few things to know:
+We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion. If you plan to contribute new features, utility functions or extensions, please first open an issue and discuss the feature with us.

+Here are a few more things to know:
 - [Microsoft Contributor License Agreement](#microsoft-contributor-license-agreement)
 - [Steps to Contributing](#steps-to-contributing)
 - [Working with Notebooks](#working-with-notebooks)
@ -52,17 +53,7 @@ Note: We use the staging branch to land all new features, so please remember to

 ## Working with Notebooks

-It's challenging to do code review for notebooks in GitHub. [reviewnb](https://www.reviewnb.com/) makes it easy to review notebooks in GitHub but only works with public repository. Since we are still in private mode, [jupytext](https://github.com/mwouts/jupytext) is another option that provides conversion of ipython notebooks to multiple formats and also work with pre-commit. However, it falls short of adding the converted files automatically as part of the git commit. An [issue](https://github.com/mwouts/jupytext/issues/200) has been opened with jupytext for this. In the interim, a more reliable way is to manually convert the notebooks to python script using [nbconvert](https://github.com/jupyter/nbconvert) and manually add and commit them to your branch along with the notebook. nbconvert comes pre-installed as part of jupyter installation, run the following command to convert a notebook to python script and save it in python folder under image_classification folder.
-```
-$ jupyter nbconvert --output-dir=./image_classification/python --to python ./image_classification/notebooks/mnist.ipynb
-```
-As you check these converted files in, we don't want to enforce black formatting and flake8 linting on them since they also contain markdown and metadata that does not play well with these tools and after all that's not the main goal for converting these notebooks to python script, main goal is to make diffing easier. You can commit the python script generated from nbconvert by using following option with *git commit* command
-```
-SKIP=black,flake8 git commit -m "commit message"
-```
-**Note:** We only want to skip black and flake8 hooks for the nbconvert py files, for everything else these hooks should not be skipped.
-
-When you pull updates from remote there might be merge conflicts at times, use [nbdime](https://nbdime.readthedocs.io/en/latest/) to fix them.
+When you pull updates from remote there might be merge conflicts with jupyter notebooks. The tool [nbdime](https://nbdime.readthedocs.io/en/latest/) can help fix such problems.
 * To install nbdime
 ```
 pip install ndime
--- a/SETUP.md
+++ b/SETUP.md
@ -1,34 +1,80 @@
 # Setup Guide

-This document describes how to setup all the dependencies to run the notebooks
-in this repository.
-
-Many computer visions scenarios are extremely computationally heavy. Training a
-model often requires a machine that has a GPU, and would otherwise be too slow.
-We recommend using the GPU-enabled [Azure Data Science Virtual Machine (DSVM)](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/) since it comes prepared with a lot of the prerequisites needed to efficiently do computer vision.
-
-To scale up or to operationalize your models, we recommend setting up [Azure
-ML](https://docs.microsoft.com/en-us/azure/machine-learning/). Our notebooks
-provide instructions on how to use it.
+This document describes how to setup all the dependencies, and optionally create a virtual machine,
+to run the notebooks in this repository.


 ## Table of Contents

-1. [Compute Environment](#compute-environments)
-1. [System Requirements](#system-requirements)
 1. [Installation](#installation)
+1. [System Requirements](#system-requirements)
+1. [Compute Environment](#compute-environments)
 1. [Tunneling](#tunneling)

+## Installation
+
+To install the repository and its dependencies follow these simple steps:  
+
+1. (optional) Install Anaconda with Python >= 3.6. [Miniconda](https://conda.io/miniconda.html). This step can be skipped if working on a Data Science Virtual Machine (see the compute environment section).
+
+1. Clone the repository
+    ```
+    git clone https://github.com/Microsoft/computervision-recipes
+    ```
+1. Install the conda environment, you'll find the `environment.yml` file in the root directory. To build the conda environment:
+    ```
+    cd computervision-recipes
+    conda env create -f environment.yml
+    ```
+1. Activate the conda environment and register it with Jupyter:
+    ```
+    conda activate cv
+    python -m ipykernel install --user --name cv --display-name "Python (cv)"
+    ```
+1. Start the Jupyter notebook server
+    ```
+    jupyter notebook
+    ```
+1. At this point, you should be able to run the notebooks within the various [scenarios](scenarios) folders.
+
+__pip install__
+
+As an alternative to the steps above, and if you only want to install the 'utils_cv' library (without creating a new conda environment), this can be done using pip install. Note that this does not download the notebooks.
+
+```bash
+pip install git+https://github.com/microsoft/ComputerVision.git@master#egg=utils_cv
+```
+
+
+## System Requirement
+
+__Requirements__
+
+* A machine running Linux (suggested) >= 16.04 LTS or Windows
+* [Miniconda](https://docs.conda.io/en/latest/miniconda.html) or Anaconda with Python version >= 3.6.
+    * This is pre-installed on Azure DSVM such that one can run the following steps directly.
+    * It is recommended to update conda to the latest version: `conda update -n base -c defaults conda`
+
+Note that PyTorch runs slower on Windows than on Linux. This is a known [issue](https://github.com/pytorch/pytorch/issues/12831) which affects model training and is due to parallelized data loading. For compute-heavy training tasks (e.g. Object Detection) and on standard GPUs this difference is typically below 10%. For image classification however, with fast GPU (e.g. V100) and potentially large images, training on Windows can be multiple times slower than on Linux.
+
+__Dependencies__
+
+Make sure you have CUDA Toolkit version 9.0 or above installed on your machine. You can run the command below in your terminal to check.
+
+```
+nvcc --version
+```
+
+If you don't have CUDA Toolkit or don't have the right version, please download it from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit)
+
+
 ## Compute Environments

-Most computer vision scenarios require a GPU, especially if you're training a
-custom model. We recommend using a virtual machine to run the notebooks on.
-Specifically, we'll want one with a powerful GPU. The NVIDIA's Tesla V100 is a
-good choice that can be found in most Azure regions.
+Many computer visions scenarios are extremely computationally heavy. Training a model often requires a machine that has a strong GPU, and would otherwise be too slow.

 The easiest way to get started is to use the [Azure Data Science Virtual Machine (DSVM)](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/). This VM will come installed with all the system requirements that are needed to run the notebooks in this repository. If you choose this option, you can skip the [System Requirements](#system-requirements) step in this guide as those requirements come pre-installed on the DSVM.

-Before creating your Azure DSVM, you need to decide what kind of VM Size you want. Some VMs have GPUs, some have multiple GPUs, and some don't have any GPUs at all. For this repo, we recommend selecting an Ubuntu VM of the size [Standard_NC6_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series). The Standard_NC6_v3 uses the Nvidia Tesla V100 which will help us train our computer vision models and iterate quickly. 
+Before creating your Azure DSVM, you need to decide what kind of VM Size you want. Some VMs have GPUs, some have multiple GPUs, and some don't have any GPUs at all. For this repo, we recommend selecting an Ubuntu VM of the size [Standard_NC6_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series). The Standard_NC6_v3 uses the Nvidia Tesla V100 which will help us train our computer vision models and iterate quickly.

 For users new to Azure, your subscription may not come with a quota for GPUs. You may need to go into the Azure portal to increase your quota for GPU vms. Learn more about how to do this here: https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits.

@ -47,71 +93,13 @@ resources.

 __Virtual Machine Builder__

-One easy way to create your DSVM is to use the [VM Builder](contrib/vm_builder) tool located inside of the 'contrib' folder in the root directory of the repo. Simply run `python contrib/vm_builder/vm_builder.py` at the root level of the repo and this tool will preconfigure your virtual machine with the appropriate settings for working with this repository.
+One easy way to create your DSVM is to use the [VM Builder](contrib/vm_builder) tool located inside of the 'contrib' folder in the root directory of the repo. Note that this tool only runs on Linux and Mac, and is not well maintained and might stop working. Simply run `python contrib/vm_builder/vm_builder.py` at the root level of the repo and this tool will preconfigure your virtual machine with the appropriate settings for working with this repository.

-> NOTE: the VM builder only works on Linux and Mac.

-## System Requirement
-
-__Requirements__
-
-* A machine running Linux >= 16.04 LTS or Windows
-* Miniconda or Anaconda with Python version >= 3.6.
-    * This is pre-installed on Azure DSVM such that one can run the following steps directly. To setup on your local machine, [Miniconda](https://docs.conda.io/en/latest/miniconda.html) is a quick way to get started.
-    * It is recommended to update conda to the latest version: `conda update -n base -c defaults conda`
-
-> NOTE: For Image Classification, Windows is up to 10x slower in training than Linux. You can set `num_workers=0`, but even still it will be up to 2x slower.
-
-> NOTE: For Object Detection, Windows is about 20% slower in training but about same speed for inference.
-
-__Dependencies__
-
-Make sure you have CUDA Toolkit version 9.0 or above installed on your machine. You can run the command below in your terminal to check.
-
-```
-nvcc --version
-```
-
-If you don't have CUDA Toolkit or don't have the right version, please download it from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit)
-
-## Installation
-To install the repo and its dependencies perform the following steps:
-
-1. Install Anaconda with Python >= 3.6. [Miniconda](https://conda.io/miniconda.html). This step can be skipped if working on a Data Science Virtual Machine.
-1. Clone the repository
-    ```
-    git clone https://github.com/Microsoft/ComputerVision
-    ```
-1. Install the conda environment, you'll find the `environment.yml` file in the root directory. To build the conda environment:
-    ```
-    conda env create -f environment.yml
-    ```
-1. Activate the conda environment and register it with Jupyter:
-    ```
-    conda activate cv
-    python -m ipykernel install --user --name cv --display-name "Python (cv)"
-    ```
-1. Start the Jupyter notebook server
-    ```
-    jupyter notebook
-    ```
-1. At this point, you should be able to run the notebooks within the various [scenarios](scenarios) folders. 
-
-__pip install__
-
-As an alternative to the steps above, and if you only want to install
-the 'utils_cv' library (without creating a new conda environment),
-this can be done using pip install:
-
-```bash
-pip install git+https://github.com/microsoft/ComputerVision.git@master#egg=utils_cv
-```
-
-> NOTE: if you install this repo using this method, you will not have the notebooks loaded by default.

 ## Tunneling

-If your compute environment is on a VM in the cloud, you can open a tunnel from your VM to your local machine using the following command:
+If your compute environment is on a Linux VM in the cloud, you can open a tunnel from your VM to your local machine using the following command:
 ```
 $ssh -L local_port:remote_address:remote_port  <username>@<server-ip>
 ```