Add mask annotation tool (#447)

* Add mask annotation tool * Update mask annotation explanation and add converion scripts * Add screenshots of Labelbox annotation * Rearrange screenshots * Move convertion script into functions in data.py * Point out annotation conversion scripts clearly in notebook * Refine annotation conversion scripts * Fix bugs * Add tests for labelbox format conversion methods
2019-12-09 10:07:33 +08:00 · 2019-12-09 10:07:33 +08:00 · cf68803d35
--- a/scenarios/detection/02_mask_rcnn.ipynb
+++ b/scenarios/detection/02_mask_rcnn.ipynb
@ -214,7 +214,7 @@
   "source": [
    "Masks are grayscale images where each pixel is either 0, ie background, or an object id indexed from 1.\n",
    "\n",
-    "**NOTE**: When preparing the dataset, the order of objects in a mask image (ie the object ids) has to be the same as in the respective annotation file."
+    "**NOTE**: When preparing the dataset, the order of objects in a mask image (ie the object ids) has to be the same as in the respective annotation file.  In preparing the dataset, we used [Labelbox](https://labelbox.com/) for mask annotation.  The masks from Labelbox's annotation can be extracted by using our `extract_masks_from_labelbox_json()` which can be imported from `utils_cv.detection.data`."
   ]
  },
  {
--- a/scenarios/detection/FAQ.md
+++ b/scenarios/detection/FAQ.md
@ -38,6 +38,30 @@ Annotated object locations are required to train and evaluate an object detector

 When creating a new project in VOTT, note that the "source connection" can simply point to a local folder which contains the images to be annotated, and respectively the "target connection" to a folder where to write the output. Pascal VOC style annotations can be exported by selecting "Pascal VOC" in the "Export Settings" tab and then using the "Export Project" button in the "Tags Editor" tab.

+For mask (segmentation) annotation, an easy-to-use online tool is
+[Labelbox](https://labelbox.com/).  Other alternatives include
+[CVAT](https://github.com/opencv/cvat) and
+[RectLabel](https://rectlabel.com/) (Mac only).
+
+<p align="center"> <img src="media/labelbox_mask_annotation.png"
+width="600"/> </p>
+
+A good demo can be found at [Introducing Image Segmentation at
+Labelbox](https://labelbox.com/blog/introducing-image-segmentation/).
+Besides for annotating mask, Labelbox can also be used to annotate
+bounding box, polyline and keypoint.
+
+<p align="center">
+<img src="media/labelbox_keypoint_annotation.png" width="600"/>
+</p>
+
+However, it has limitation to the number of labeled images per year
+for free account.  And it does not provide export options for COCO or
+PASCAL VOC.  Annotations at Labelbox still needs to be converted into
+the format used in our notebooks, which is explained in our [Mask R-CNN
+notebook](02_mask_rcnn.ipynb).
+
+
 Selection and annotating images is complex and consistency is key. For example:
 * All objects in an image need to be annotated, even if the image contains many of them. Consider removing the image if this would take too much time.
 * Ambiguous images should be removed, for example if it is unclear to a human if an object is lemon or a tennis ball, or if the image is blurry, etc.
@ -61,7 +85,7 @@ Similar to most object detection methods, R-CNN use a deep Neural Network which
  1. Given an input image
  2. A large number region proposals, aka Regions-of-Interests (ROIs), are generated.
  3. These ROIs are then independently sent through the network which outputs a vector of e.g. 4096 floating point values for each ROI.
-  4. Finally, a classifier is learned which takes the 4096 floats ROI representation as input and outputs a label and confidence to each ROI.  
+  4. Finally, a classifier is learned which takes the 4096 floats ROI representation as input and outputs a label and confidence to each ROI.
 <p align="center">
 <img src="media/rcnn_pipeline.jpg" width="600" align="center"/>
 </p>
@ -72,7 +96,7 @@ While this approach works well in terms of accuracy, it is very costly to comput
 ### Intersection-over-Union overlap metric
 It is often necessary to measure by how much two given rectangles overlap. For example, one rectangle might correspond to the ground-truth location of an object, while the second rectangle corresponds to the estimated location, and the goal is to measure how precise the object was detected.

-For this, a metric called Intersection-over-Union (IoU) is typically used. In the example below, the IoU is given by dividing the yellow area by the combined yellow and blue areas. An IoU of 1.0 corresponds to a perfect match, while an IoU of 0 indicates that the two rectangles do not overlap. Typically an IoU of 0.5 is considered a good localization. See also this [page](https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/) for a more in-depth discussion.     
+For this, a metric called Intersection-over-Union (IoU) is typically used. In the example below, the IoU is given by dividing the yellow area by the combined yellow and blue areas. An IoU of 1.0 corresponds to a perfect match, while an IoU of 0 indicates that the two rectangles do not overlap. Typically an IoU of 0.5 is considered a good localization. See also this [page](https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/) for a more in-depth discussion.
 <p align="center">
 <img src="media/iou_example.jpg" width="400" align="center"/>
 </p>
--- a/scenarios/detection/media/labelbox_keypoint_annotation.png
+++ b/scenarios/detection/media/labelbox_keypoint_annotation.png
--- a/scenarios/detection/media/labelbox_mask_annotation.png
+++ b/scenarios/detection/media/labelbox_mask_annotation.png
--- a/tests/unit/detection/test_detection_data.py
+++ b/tests/unit/detection/test_detection_data.py
@ -1,9 +1,153 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT License.

+import hashlib
+import numpy as np
+import pytest
 import requests

-from utils_cv.detection.data import coco_labels, Urls
+from PIL import Image
+from pathlib import Path
+import xml.etree.ElementTree as ET
+
+from utils_cv.detection.data import (
+    coco_labels,
+    Urls,
+    extract_keypoints_from_labelbox_json,
+    extract_masks_from_labelbox_json,
+)
+
+
+@pytest.fixture(scope="session")
+def labelbox_export_data(tmp_session):
+    tmp_session = Path(tmp_session)
+
+    data_dir = tmp_session / "labelbox_test_data"
+    im_dir = data_dir / "images"
+    anno_dir = data_dir / "annotations"
+
+    im_dir.mkdir(parents=True, exist_ok=True)
+    anno_dir.mkdir(parents=True, exist_ok=True)
+
+    keypoint_json_path = tmp_session / "labelbox_keypoint.json"
+    mask_json_path = tmp_session / "labelbox_mask.json"
+
+    # generate dummy images and PASCAL VOC annotations
+    for i in range(2):
+        # a completely black image
+        im = Image.fromarray(np.zeros((500, 500, 3), dtype=np.uint8))
+        im.save(im_dir / f"{i}.jpg")
+
+        # a dummy PASCAL VOC annotation XML
+        anno_xml = """<annotation>
+    <folder>images</folder>
+    <size>
+        <width>500</width>
+        <height>500</height>
+        <depth>3</depth>
+    </size>
+    <object>
+        <name>milk_bottle</name>
+        <bndbox>
+            <xmin>100</xmin>
+            <ymin>100</ymin>
+            <xmax>199</xmax>
+            <ymax>199</ymax>
+        </bndbox>
+    </object>
+    <object>
+        <name>carton</name>
+        <bndbox>
+            <xmin>300</xmin>
+            <ymin>300</ymin>
+            <xmax>399</xmax>
+            <ymax>399</ymax>
+        </bndbox>
+    </object>
+</annotation>
+"""
+        with open(anno_dir / f"{i}.xml", "w") as f:
+            f.write(anno_xml)
+
+    # generate Labelbox keypoint JSON file
+    keypoint_json = """[{
+     "Label": {
+         "milk_bottle_p1": [{"geometry": {"x": 320,"y": 320}}],
+         "milk_bottle_p2": [{"geometry": {"x": 350,"y": 350}}],
+         "milk_bottle_p3": [{"geometry": {"x": 390,"y": 390}}],
+         "carton_p1": [{"geometry": {"x": 130,"y": 130}}],
+         "carton_p2": [{"geometry": {"x": 190,"y": 190}}]
+     },
+     "External ID": "1.jpg"}
+]
+"""
+    # Dict version of the combination of keypoint_json and anno_xml
+    keypoint_truth_dict = {
+        "folder": "images",
+        "size": {
+            "width": "500",
+            "height": "500",
+            "depth": "3",
+        },
+        "object": {
+            "milk_bottle": {
+                "bndbox": {
+                    "xmin": "100",
+                    "ymin": "100",
+                    "xmax": "199",
+                    "ymax": "199",
+                },
+                "keypoints": {
+                    "p1": {"x": "320", "y": "320"},
+                    "p2": {"x": "350", "y": "350"},
+                    "p3": {"x": "390", "y": "390"},
+                },
+            },
+            "carton": {
+                "bndbox": {
+                    "xmin": "300",
+                    "ymin": "300",
+                    "xmax": "399",
+                    "ymax": "399",
+                },
+                "keypoints": {
+                    "p1": {"x": "130", "y": "130"},
+                    "p2": {"x": "190", "y": "190"},
+                },
+            },
+        },
+    }
+    with open(keypoint_json_path, "w") as f:
+        f.write(keypoint_json)
+
+    # generate Labelbox mask JSON file
+    # The dummy mask file are generated by:
+    #   >>> im = np.zeros((500, 500, 4), dtype=np.uint8)
+    #   >>> im[100:200, 100:200] = 255
+    #   >>> im = Image.fromarray(im).save("labelbox_test_dummy_milk_bottle_mask.png")
+    #   >>> im = np.zeros((500, 500, 4), dtype=np.uint8)
+    #   >>> im[300:400, 300:400] = 255
+    #   >>> Image.fromarray(im).save("labelbox_test_dummy_carton_mask.png")
+    mask_json = """[{
+     "Label": {
+         "objects": [
+             {
+                 "value": "carton",
+                 "instanceURI": "https://cvbp.blob.core.windows.net/public/datasets/object_detection/labelbox_test_dummy_carton_mask.png"
+             },
+             {
+                 "value": "milk_bottle",
+                 "instanceURI": "https://cvbp.blob.core.windows.net/public/datasets/object_detection/labelbox_test_dummy_milk_bottle_mask.png"
+             }
+         ]
+     },
+     "External ID": "1.jpg"}
+]
+"""
+    with open(mask_json_path, "w") as f:
+        f.write(mask_json)
+
+    return data_dir, mask_json_path, keypoint_json_path, keypoint_truth_dict


 def test_urls():
@ -30,3 +174,112 @@ def test_coco_labels():

    # Check total number of labels
    assert len(labels) == 91
+
+
+def test_extract_keypoints_from_labelbox_json(labelbox_export_data, tmp_session):
+    data_dir, _, keypoint_json_path, keypoint_truth_dict = labelbox_export_data
+    keypoint_data_dir = Path(tmp_session) / "labelbox_test_keypoint_data"
+    keypoint_data_dir.mkdir(parents=True, exist_ok=True)
+
+    # run extract_keypoints_from_labelbox_json()
+    extract_keypoints_from_labelbox_json(
+        keypoint_json_path,
+        data_dir,
+        keypoint_data_dir,
+    )
+
+    # verify keypoint data directory structure
+    # only 1.jpg and 1.xml are included
+    subdir_exts = [("annotations", "xml"), ("images", "jpg")]
+    assert len([str(x) for x in keypoint_data_dir.iterdir()]) == 2
+    for name, ext in subdir_exts:
+        subdir = keypoint_data_dir / name
+        file_paths = [x for x in subdir.iterdir()]
+        assert len(file_paths) == 1
+        assert subdir / f"0.{ext}" not in file_paths
+        assert subdir / f"1.{ext}" in file_paths
+
+    # verify 1.jpg
+    def md5sum(path):
+        with open(path, "rb") as f:
+            md5 = hashlib.md5(f.read()).hexdigest()
+        return md5
+
+    im_path = "images/1.jpg"
+    assert md5sum(data_dir / im_path) == md5sum(keypoint_data_dir / im_path)
+
+    # verify 1.xml
+    tree = ET.parse(keypoint_data_dir / "annotations" / "1.xml")
+    root = tree.getroot()
+    # verify "folder"
+    assert len(root.findall("folder")) == 1
+    assert root.find("folder").text == keypoint_truth_dict["folder"]
+    # verify "size"
+    assert len(root.findall("size")) == 1
+    size_node = root.find("size")
+    size_truth = keypoint_truth_dict["size"]
+    assert len(size_node.findall("width")) == 1
+    assert size_node.find("width").text == size_truth["width"]
+    assert size_node.find("height").text == size_truth["height"]
+    assert size_node.find("depth").text == size_truth["depth"]
+    # verify "object"
+    obj_nodes = root.findall("object")
+    obj_truths = keypoint_truth_dict["object"]
+    assert len(obj_nodes) == len(obj_truths)
+    for obj_node in obj_nodes:
+        obj_name = obj_node.find("name").text
+        # verify "bndbox"
+        bndbox_node = obj_node.find("bndbox")
+        bndbox_truth = obj_truths[obj_name]["bndbox"]
+        for coord in bndbox_truth:
+            assert bndbox_node.find(coord).text == bndbox_truth[coord]
+        # verify "keypoints"
+        kp_node = obj_node.find("keypoints")
+        kp_truth = obj_truths[obj_name]["keypoints"]
+        for kp_name in kp_truth:
+            p_node = kp_node.find(kp_name)
+            p_truth = kp_truth[kp_name]
+            assert p_node.find("x").text == p_truth["x"]
+            assert p_node.find("y").text == p_truth["y"]
+
+
+def test_extract_masks_from_labelbox_json(labelbox_export_data, tmp_session):
+    data_dir, mask_json_path, _, _ = labelbox_export_data
+    mask_data_dir = Path(tmp_session) / "labelbox_test_mask_data"
+    mask_data_dir.mkdir(parents=True, exist_ok=True)
+
+    # run masks_from_labelbox_json()
+    extract_masks_from_labelbox_json(
+        mask_json_path,
+        data_dir,
+        mask_data_dir,
+    )
+
+    # verify mask data directory structure
+    # only 1.jpg, 1.xml and 1.png are included
+    assert len([str(x) for x in mask_data_dir.iterdir()]) == 3
+    for name, ext in [
+        ("annotations", "xml"),
+        ("images", "jpg"),
+        ("segmentation-masks", "png"),
+    ]:
+        subdir = mask_data_dir / name
+        file_paths = [x for x in subdir.iterdir()]
+        assert len(file_paths) == 1
+        assert subdir / f"0.{ext}" not in file_paths
+        assert subdir / f"1.{ext}" in file_paths
+
+    # verify 1.jpg and 1.xml
+    def md5sum(path):
+        with open(path, "rb") as f:
+            md5 = hashlib.md5(f.read()).hexdigest()
+        return md5
+
+    for name in ["images/1.jpg", "annotations/1.xml"]:
+        assert md5sum(data_dir / name) == md5sum(mask_data_dir / name)
+
+    # verify 1.png
+    mask = np.array(Image.open(mask_data_dir / "segmentation-masks" / "1.png"))
+    assert mask.shape == (500, 500)
+    assert np.all(mask[100:200, 100:200] == 1)
+    assert np.all(mask[300:400, 300:400] == 2)
--- a/utils_cv/detection/data.py
+++ b/utils_cv/detection/data.py
@ -1,8 +1,16 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT License.

-from typing import List
+from typing import List, Union
 from urllib.parse import urljoin
+from PIL import Image
+from pathlib import Path
+
+import json
+import numpy as np
+import shutil
+import urllib.request
+import xml.etree.ElementTree as ET


 class Urls:
@ -132,3 +140,304 @@ def coco_labels() -> List[str]:
        "hair drier",
        "toothbrush",
    ]
+
+
+def extract_masks_from_labelbox_json(
+    labelbox_json_path: Union[str, Path],
+    data_dir: Union[str, Path],
+    mask_data_dir: Union[str, Path] = None,
+) -> None:
+    """ Extract masks from Labelbox annotation JSON file.
+
+    It reads in an annotation JSON file created by the Labelbox annotation UI
+    (https://labelbox.com/), downloads the binary segmentation masks for all
+    objects, merges them in the order of the bounding boxes described in the
+    corresponding PASCAL VOC annotation file, and then writes the resultant
+    mask into a directory called "segmentation-masks".
+
+    The annotation files in
+    [odFridgeObjects](https://cvbp.blob.core.windows.net/public/datasets/object_detection/odFridgeObjects.zip)
+    are in the format of PASCAL VOC shown in our
+    [01 notebook](../../scenarios/detection/01_training_introduction.ipynb).
+
+    The data structure of the export JSON file from Labelbox looks like:
+
+    ```
+    {"Dataset Name": "odFridgeObjects",
+     "External ID": "117.jpg",
+     "Label": {"objects": [{"color": "#00D4FF",
+                            "featureId": "ck1iu6m3suwmo0944zoufayto",
+                            "instanceURI": "https://api.labelbox.com/masks/ck1iphg4xsqhe0944bbbiwrak",
+                            "schemaId": "ck1ipz4v5s5rd0701j2mfc4ii",
+                            "title": "water_bottle",
+                            "value": "water_bottle"},
+                           {"color": "#00FFFF",
+                            "featureId": "ck1iuonmvryt608388vlq6t9z",
+                            "instanceURI": "https://api.labelbox.com/masks/ck1iphg4xsqhe0944bbbiwrak",
+                            "schemaId": "ck1ipz4v5s5re0701sojrveb3",
+                            "title": "milk_bottle",
+                            "value": "milk_bottle"}]},
+     "Labeled Data": "https://storage.labelbox.com/58d748d4418a-117.jpg",
+     "View Label": "https://editor.labelbox.com?project=ck1iphg4xsqhe&label=ck1iq31v1qqht086"}
+    ```
+
+    It is a list of `Dict` where each `Dict` is the meta data for an
+    image.  Key fields include:
+    * **`annos[n]["External ID"]`**: Original image file name
+    * `annos[n]["Labeled Data"]`: URL of the original image
+    * `annos[n]["View Label"]`: URL of the image with labels or masks
+    * `annos[n]["Label"]`: Dict.  Meta data of all annotations of the
+      image
+    * `annos[n]["Label"]["objects"]`: List.  Meta data of all objects of
+      the image.
+    * `annos[n]["Label"]["objects"][0]["value"]`: Object name (category)
+    * **`annos[n]["Label"]["objects"][0]["instanceURI"]`**: URL of the
+      binary mask of the object, with 0 as background, 255 as the object.
+
+    Take the
+    [`odFridgeObjects`](https://cvbp.blob.core.windows.net/public/datasets/object_detection/odFridgeObjects.zip)
+    dataset as an example.  Here the XML annotations are in the
+    `odFridgeObjects/annotations` folder and the original images are in
+    the `odFridgeObjects/images` folder.  For an arbitrary image
+    `odFridgeObjects/images/xyz.jpg`, its corresponding XML annotation
+    file is `odFridgeObjects/annotations/xyz.xml`.
+
+    Because the missing parts are the masks annotated in LabelBox, the
+    only thing we need to do is to combine all binary masks
+    (`[obj["instanceURI"] for obj in annos[0]["Label"]["objects"]]`) of
+    individual objects from an image (`annos[0]["External ID"]`) into a
+    single mask image (`annos[0]["External ID"][:-4] + ".png"`) in a
+    directory called `segmentation-masks`.
+
+    Args:
+        labelbox_json_path: mask annotation JSON file from Labelbox
+        data_dir: path to dataset.  The path should contain the "images" and
+            "annotations" subdirectories which store the original images and
+            PASCAL VOC annotation XML files.
+        mask_data_dir: path to the result.  It will contain a
+            "segmentation-masks" subdirectory as well as "images" and
+            "annotations".  Only images with masks described in labelbox_json_path
+             will be stored in mask_data_dir.  Mask images extracted into
+             "segmentation-masks" will be PNG files.
+    """
+
+    src_im_dir = Path(data_dir) / "images"  # image folder
+    src_anno_dir = Path(data_dir) / "annotations"  # annotation folder
+
+    dst_im_dir = Path(mask_data_dir) / "images"
+    dst_anno_dir = Path(mask_data_dir) / "annotations"
+    dst_mask_dir = Path(mask_data_dir) / "segmentation-masks"  # mask folder
+
+    # create directories for annotated dataset
+    dst_im_dir.mkdir(parents=True, exist_ok=True)
+    dst_anno_dir.mkdir(parents=True, exist_ok=True)
+    dst_mask_dir.mkdir(parents=True, exist_ok=True)
+
+    # read exported LabelBox annotation JSON file
+    with open(labelbox_json_path) as f:
+        annos = json.load(f)
+
+    # process one image per iteration
+    for anno in annos:
+        # get related file paths
+        im_name = anno["External ID"]  # image file name
+        anno_name = im_name[:-4] + ".xml"  # annotation file name
+        mask_name = im_name[:-4] + ".png"  # mask file name
+
+        print("Processing image: {}".format(im_name))
+
+        src_im_path = src_im_dir / im_name
+        src_anno_path = src_anno_dir / anno_name
+
+        dst_im_path = dst_im_dir / im_name
+        dst_anno_path = dst_anno_dir / anno_name
+        dst_mask_path = dst_mask_dir / mask_name
+
+        # copy original image and annotation file
+        shutil.copy(src_im_path, dst_im_path)
+        shutil.copy(src_anno_path, dst_anno_path)
+
+        # read mask images
+        mask_urls = [obj["instanceURI"] for obj in anno["Label"]["objects"]]
+        labels = [obj["value"] for obj in anno["Label"]["objects"]]
+        binary_masks = np.array([
+            np.array(Image.open(urllib.request.urlopen(url)))[..., 0] == 255
+            for url in mask_urls
+        ])
+
+        # rearrange masks with regard to annotation
+        tree = ET.parse(dst_anno_path)
+        root = tree.getroot()
+        rects = []
+        for obj in root.findall("object"):
+            label = obj.find("name").text
+            bnd_box = obj.find("bndbox")
+            left = int(bnd_box.find("xmin").text)
+            top = int(bnd_box.find("ymin").text)
+            right = int(bnd_box.find("xmax").text)
+            bottom = int(bnd_box.find("ymax").text)
+            rects.append((label, left, top, right, bottom))
+
+        assert len(rects) == len(binary_masks)
+        matches = []
+        # find matched binary mask and annotation
+        for label, left, top, right, bottom in rects:
+            match = 0
+            min_overlap = binary_masks.shape[1] * binary_masks.shape[2]
+            for i, bmask in enumerate(binary_masks):
+                bmask_out = bmask.copy()
+                bmask_out[top:(bottom + 1), left:(right + 1)] = False
+                non_overlap = np.sum(bmask_out)
+                if non_overlap < min_overlap:
+                    match = i
+                    min_overlap = non_overlap
+            assert label == labels[match], \
+                "{}: {}".format(label, labels[match])
+            matches.append(match)
+
+        assert len(set(matches)) == len(matches), \
+            "{}: {}".format(len(set(matches)), len(matches))
+
+        binary_masks = binary_masks[matches]
+
+        # merge binary masks
+        obj_values = np.arange(len(binary_masks)) + 1
+        labeled_masks = binary_masks * obj_values[:, None, None]
+        mask = np.max(labeled_masks, axis=0).astype(np.uint8)
+
+        # save mask image
+        Image.fromarray(mask, mode="L").save(dst_mask_path)
+
+
+def extract_keypoints_from_labelbox_json(
+    labelbox_json_path: Union[str, Path],
+    data_dir: Union[str, Path],
+    keypoint_data_dir: Union[str, Path] = None,
+) -> None:
+    """ Extract keypoints from Labelbox annotation JSON file.
+
+    It reads in an annotation JSON file created by the Labelbox annotation UI
+    (https://labelbox.com/), extracts the annotated keypoints for all objects,
+    and then writes them into the corresponding PASCAL VOC annotation file.
+
+    The data structure of the export JSON file from Labelbox looks like:
+
+    ```
+    {"Dataset Name": "odFridgeObjects",
+     "External ID": "21.jpg",
+     "Label": {"carton_left_back_bottom": [{"geometry": {"x": 217, "y": 277}}],
+               "carton_left_back_shoulder": [{"geometry": {"x": 410, "y": 340}}],
+               "carton_left_collar": [{"geometry": {"x": 416, "y": 367}}],
+               "carton_left_front_bottom": [{"geometry": {"x": 161, "y": 299}}],
+               "carton_left_front_shoulder": [{"geometry": {"x": 359, "y": 375}}],
+               "carton_left_top": [{"geometry": {"x": 438, "y": 379}}],
+               "carton_lid": [{"geometry": {"x": 392, "y": 427}}],
+               "carton_right_collar": [{"geometry": {"x": 398, "y": 450}}],
+               "carton_right_front_bottom": [{"geometry": {"x": 166, "y": 371}}],
+               "carton_right_front_shoulder": [{"geometry": {"x": 350, "y": 462}}],
+               "carton_right_top": [{"geometry": {"x": 424, "y": 455}}],
+               "water_bottle_lid_left_bottom": [{"geometry": {"x": 243, "y": 444}}],
+               "water_bottle_lid_left_top": [{"geometry": {"x": 266, "y": 456}}],
+               "water_bottle_lid_right_bottom": [{"geometry": {"x": 220,
+                                                               "y": 499}}],
+               "water_bottle_lid_right_top": [{"geometry": {"x": 243, "y": 511}}],
+               "water_bottle_wrapper_left_bottom": [{"geometry": {"x": 77,
+                                                                  "y": 344}}],
+               "water_bottle_wrapper_left_top": [{"geometry": {"x": 161,
+                                                               "y": 379}}],
+               "water_bottle_wrapper_right_bottom": [{"geometry": {"x": 30,
+                                                                   "y": 424}}],
+               "water_bottle_wrapper_right_top": [{"geometry": {"x": 120,
+                                                                "y": 477}}]},
+     "Labeled Data": "https://storage.labelbox.com/ck1ipbufauu4f072105748106f5ce6-21.jpg",
+     "View Label": "https://image-segmentation-v4.labelbox.com?project=ck36v24&label=ck36xdrzryw"}
+    ```
+
+    It is a list of `Dict` where each `Dict` is the meta data for an
+    image.  Key fields include:
+    * **`annos[n]["External ID"]`**: Original image file name
+    * `annos[n]["Labeled Data"]`: URL of the original image
+    * `annos[n]["View Label"]`: URL of the image with labels or masks
+    * `annos[n]["Label"]`: Dict.  Meta data of all annotations of the
+      image.  Its keys are the labels of keypoints, and its values are the
+      coordinates.
+    * **`annos[n]["Label"]["xxx"][0]["geometry"]["x"]`**: The x coordinate
+      of the label `xxx`.
+    * **`annos[n]["Label"]["xxx"][0]["geometry"]["y"]`**: The y coordinate
+      of the label `xxx`.
+
+    **NOTE** that things become tricky when there are multiple instances
+    of the same category exist in an image.  But for now in the
+    odFridgeObjects dataset, no more than one instance of a category
+    exists in an image.  In addition, there is no natural way of
+    specifying a point belongs to a label in Labelbox.  Therefore, for
+    example, we use the prefix `carton_` to indicate the point labeled
+    `carton_left_back_bottom` is a point that belongs to a carton.
+
+    Args:
+        labelbox_json_path: keypoint annotation JSON file from Labelbox
+        data_dir: path to dataset.  The path should contain the "images" and
+            "annotations" subdirectories which store the original images and
+            PASCAL VOC annotation XML files.
+        keypoint_data_dir: path to the result.  It will contain the "images"
+            and "annotations" subdirectories.  Only images with keypoints
+            described in labelbox_json_path will be stored in keypoint_data_dir.
+            The XML files in the "annotations" directory will also include the
+            keypoint annotations extracted from Labelbox"s JSON file.
+    """
+
+    # original image folder
+    src_im_dir = Path(data_dir) / "images"
+    # original annotation folder
+    src_anno_dir = Path(data_dir) / "annotations"
+
+    # keypoint image folder
+    dst_im_dir = Path(keypoint_data_dir) / "images"
+    # keypoint annotation folder
+    dst_anno_dir = Path(keypoint_data_dir) / "annotations"
+
+    # create directories for annotated dataset
+    dst_im_dir.mkdir(parents=True, exist_ok=True)
+    dst_anno_dir.mkdir(parents=True, exist_ok=True)
+
+    # read exported LabelBox annotation JSON file
+    with open(labelbox_json_path) as f:
+        annos = json.load(f)
+
+    # process one image keypoints annotation per iteration
+    for anno in annos:
+        # get related file paths
+        im_name = anno["External ID"]      # image file name
+        anno_name = im_name[:-4] + ".xml"  # annotation file name
+
+        print("Processing image: {}".format(im_name))
+
+        src_im_path = src_im_dir / im_name
+        src_anno_path = src_anno_dir / anno_name
+
+        dst_im_path = dst_im_dir / im_name
+        dst_anno_path = dst_anno_dir / anno_name
+
+        # copy original image
+        shutil.copy(src_im_path, dst_im_path)
+
+        # add keypoints annotation into PASCAL VOC XML file
+        kps_annos = anno["Label"]
+        tree = ET.parse(src_anno_path)
+        root = tree.getroot()
+        for obj in root.findall("object"):
+            prefix = obj.find("name").text + "_"
+            # add "keypoints" node for current object
+            kps = ET.SubElement(obj, "keypoints")
+            for k in kps_annos.keys():
+                if k.startswith(prefix):
+                    # add keypoint into "keypoints" node
+                    pt = ET.SubElement(kps, k[len(prefix):])
+                    x = ET.SubElement(pt, "x")  # add x coordinate
+                    y = ET.SubElement(pt, "y")  # add y coordinate
+                    geo = kps_annos[k][0]["geometry"]
+                    x.text = str(geo["x"])
+                    y.text = str(geo["y"])
+
+        # write modified annotation file
+        tree.write(dst_anno_path)