small text edits to the 02 notebook

2019-11-27 15:50:08 +00:00 · 2019-11-27 15:50:08 +00:00 · 5022538b66
--- a/scenarios/detection/02_mask_rcnn.ipynb
+++ b/scenarios/detection/02_mask_rcnn.ipynb
@ -41,7 +41,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "[Mask R-CNN](https://arxiv.org/abs/1703.06870) is an instance segmentation alrogithm based on top of [Faster R-CNN](https://arxiv.org/abs/1506.01497) and adds an extra branch for predicting segmentation masks for objects (instances).  That is, the same feature map for training the RPN (Region Proposal Network) and classifier in Faster R-CNN is also used in Mask R-CNN by a FCN (Fully Convolutional Network) to predict a binary mask for the object inside a bounding box.\n",
+    "[Mask R-CNN](https://arxiv.org/abs/1703.06870) is an instance segmentation algorithm based on top of [Faster R-CNN](https://arxiv.org/abs/1506.01497) and adds an extra branch for predicting segmentation masks for object instances.  That is, the same feature map for training the RPN (Region Proposal Network) and classifier in Faster R-CNN is also used in Mask R-CNN by a FCN (Fully Convolutional Network) to predict a binary mask for the object inside a bounding box.\n",
    "\n",
    "<img src=\"./media/mask-r-cnn-framework.png\" width=\"600\"/>"
   ]
@ -153,10 +153,10 @@
   "source": [
    "## Browse the Dataset\n",
    "\n",
-    "We are going to use the [odFridgeObjects-mask datasets](https://cvbp.blob.core.windows.net/public/datasets/object_detection/odFridgeObjectsMask.zip) for illustration.  The dataset has already downloaded and unzipped into `DATA_PATH`.  This dataset includes 31 images of 4 class labels: `can`, `carton`, `milk_bottle` and `water_bottle`.\n",
+    "We are going to use the [odFridgeObjectsMask dataset](https://cvbp.blob.core.windows.net/public/datasets/object_detection/odFridgeObjectsMask.zip) for illustration.  The dataset is already downloaded and unzipped into `DATA_PATH`.  This dataset includes 31 images of 4 class labels: `can`, `carton`, `milk_bottle` and `water_bottle`.\n",
    "\n",
    "```\n",
-    "odFridgeObjects-mask/\n",
+    "odFridgeObjectsMask/\n",
    "├── annotations\n",
    "│   ├── 1.xml\n",
    "│   ├── 2.xml\n",
@ -182,7 +182,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The `images` and `segmentation-masks` directory contain original images and their corresponding masks.  The annotations in the `annotations` directory are of format [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) shown as in [01_training_introduction notebook](01_training_introduction.ipynb)."
+    "The `images` and `segmentation-masks` directories contain the images and their corresponding masks.  The files in the `annotations` directory are of format [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) explained in the [01_training_introduction notebook](01_training_introduction.ipynb)."
   ]
  },
  {
@ -212,9 +212,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Masks are grayscale images where the value of pixels belong to a specific object are the object id indexed from 1.\n",
+    "Masks are grayscale images where each pixel is either 0, ie background, or an object id indexed from 1.\n",
    "\n",
-    "**NOTE**: When preparing the dataset, we must make sure the order of objects in the mask image be the same as in the annotation file.  In other words, for the following image, the value of `water_bottle`'s pixels must be 1 and that of `milk_bottle` must be 2."
+    "**NOTE**: When preparing the dataset, the order of objects in a mask image (ie the object ids) has to be the same as in the respective annotation file."
   ]
  },
  {
@ -259,7 +259,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We will use the pretrained [Mask R-CNN ResNet-50 FPN](https://pytorch.org/docs/stable/torchvision/models.html#mask-r-cnn) in PyTorch for instance segmentation.  As described on [this torchvision page](https://pytorch.org/docs/stable/torchvision/models.html#mask-r-cnn), the model is pretrained on [COCO train2017](http://images.cocodataset.org/zips/train2017.zip) (18GB).  It expects a list of images as `List[Tensor[C, H, W]]` in the range of `0-1` and returns the predictions as `List[Dict[Tensor]]`.  The fields of the `Dict` include `scores`, `labels`, `boxes` and `masks`, each of which is of the same length as the input image list.  The `labels` belong to the 91 categories in the [COCO datasets](http://cocodataset.org/).\n",
+    "We will use a [Mask R-CNN](https://pytorch.org/docs/stable/torchvision/models.html#mask-r-cnn) model which was pre-trained on the [COCO](http://images.cocodataset.org/zips/train2017.zip) dataset (18GB, see [this torchvision page](https://pytorch.org/docs/stable/torchvision/models.html#mask-r-cnn)). The `predict()` funtion outputs a dictionary of keys `scores`, `labels`, `boxes` and `masks`.  The `labels` belong to the 91 categories in the [COCO datasets](http://cocodataset.org/).\n",
    "\n",
    "Similar to [01_training_introduction notebook](01_training_introduction.ipynb), we can use `get_pretrained_maskrcnn()` to get the pretrained Mask R-CNN model to create a `DetectionLearner`."
   ]
@ -301,7 +301,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The model detected the unexpeced \"dining table\" and mistaken the \"carton\" as a \"book\".  That is because the [COCO](http://cocodataset.org/) has a class called \"dining table\" and does not include the class \"carton\"."
+    "As can be seen above, the model mistakes the milk carton as a book, since COCO does not have a \"carton\" class. It also detects the counter as \"dining table\"."
   ]
  },
  {
@ -315,7 +315,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Since there are no \"carton\", \"milk bottle\" and \"water bottle\" in the labels of COCO datasets, we need to fine-tune the pre-trained Mask R-CNN model for odFridgeObjects-mask with the 4 labels and get rid of the other unexpected categories, such as \"dining table\".  Following the practice in our [01_training_introduction notebook](01_training_introduction.ipynb), we need to prepare the `DetectionDataset` to be used by the `DetectionLearner` with a customized Mask R-CNN model.  To prepare a custom dataset, there should be a separate directory containing the masks demontrated in the above."
+    "Since COCO does not contain the classes \"carton\", \"milk bottle\" or \"water bottle\", we need to fine-tune the pre-trained Mask R-CNN model on our fridge object dataset. This will also remove the other unexpected categories, such as \"dining table\".  Following the practice in our [01_training_introduction notebook](01_training_introduction.ipynb), we need to prepare the `DetectionDataset` to be used by the `DetectionLearner` during model training."
   ]
  },
  {
@ -329,7 +329,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "To load the data, we need to create a Dataset object class that Torchvision knows how to use.  To make it more convinient, we've created a DetectionDataset class that knows how to extract annotation information from the Pascal VOC format and meet the requirements of the Torchvision dataset object class.  There is an additional parameter `mask_dir` for specifying the mask directory."
+    "To load the data, we need to create a dataset object which Torchvision knows how to use.  To make this more convinient, we wrote the `DetectionDataset` class which also knows how to extract annotation information from the Pascal VOC format.  There is an additional parameter `mask_dir` for specifying the mask directory."
   ]
  },
  {
@ -364,7 +364,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We provide the `get_pretrained_maskrcnn()` function to facilitate the customization of Mask R-CNN model with a ResNet-50-FPN backbone."
+    "We provide the `get_pretrained_maskrcnn()` function to facilitate loading a pre-trained Mask R-CNN model with a ResNet-50-FPN backbone."
   ]
  },
  {
@ -511,7 +511,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Using the re-trained model, we can predict again."
+    "Given the model trained on our fridge objects datasets, we can now run inference again on the image used at the start of this notebook. The two objects are now found correctly, with tightly fitting segmentation masks."
   ]
  },
  {
@ -552,7 +552,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Now we have `milk_bottle` and `can` in the labels of the model.  However, because there are only 31 images annotated with masks, this result is not ideal."
+    "Note that our dataset is very small to ensure that this notebook runs quickly. Hence the trained model would likely not generalize well to unseen object appearances or backgrounds. "
   ]
  },
  {