update to python3, pytorch0.4.1, refactor code

2019-10-18 18:45:05 +02:00 · 2019-10-18 18:45:05 +02:00 · 283d13645a
--- a/README.md
+++ b/README.md
@ -1,9 +1,11 @@
 # SINGLESHOTPOSE
 
-This is the code for the following paper:
+This is the development version of the code for the following paper:

 Bugra Tekin, Sudipta N. Sinha and Pascal Fua, "Real-Time Seamless Single Shot 6D Object Pose Prediction", CVPR 2018. 
- 
+
+The original repository for the codebase for the above paper can be found in the following [link](https://github.com/Microsoft/singleshotpose/).
+
 ### Introduction

 We propose a single-shot approach for simultaneously detecting an object in an RGB image and predicting its 6D pose without requiring multiple stages or having to examine multiple hypotheses. The key component of our method is a new CNN architecture inspired by the YOLO network design that directly predicts the 2D image locations of the projected vertices of the object's 3D bounding box. The object's 6D pose is then estimated using a PnP algorithm. [Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Tekin_Real-Time_Seamless_Single_CVPR_2018_paper.pdf), [arXiv](https://arxiv.org/abs/1711.08848)
@ -13,7 +15,7 @@ We propose a single-shot approach for simultaneously detecting an object in an R
 #### Citation
 If you use this code, please cite the following
 > @inproceedings{tekin18,  
-&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;TITLE = {{Real-Time Seamless Single Shot 6D Object Pose Prediction}},  
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;TITLE = {{Real-Time Seamless Single Shot 6D Object Pose Prediction}}, 
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;AUTHOR = {Tekin, Bugra and Sinha, Sudipta N. and Fua, Pascal},  
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;BOOKTITLE =  {CVPR},  
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;YEAR = {2018}  
@ -25,7 +27,7 @@ SingleShotPose is released under the MIT License (refer to the LICENSE file for

 #### Environment and dependencies

-The code is tested on Linux with CUDA v8 and cudNN v5.1. The implementation is based on PyTorch 0.3.1 and tested on Python2.7. The code requires the following dependencies that could be installed with conda or pip: numpy, scipy, PIL, opencv-python. For a version that is Python 3 and Pytorch 0.4 compatible, you could see [this link](https://github.com/btekin/singleshot6Dpose).
+The code is tested on **Windows** with CUDA v8 and cudNN v5.1. The implementation is based on **PyTorch 0.4.1** and tested on **Python3.6**. The code requires the following dependencies that could be installed with conda or pip: numpy, scipy, PIL, opencv-python. For an earlier version that is compatible with PyTorch 0.3.1 and tested on Python2.7, please see ```py2``` folder.

 #### Downloading and preparing the data

@ -41,25 +43,30 @@ tar xf backup.tar
 tar xf multi_obj_pose_estimation/backup_multi.tar -C multi_obj_pose_estimation/
 tar xf VOCtrainval_11-May-2012.tar
 ```
-Alternatively, you can directly go to the links above and manually download and extract the files at the corresponding directories. The whole download process might take a long while (~60 minutes).
+Alternatively, you can directly go to the links above and manually download and extract the files at the corresponding directories. The whole download process might take a long while (~60 minutes). Please also be aware that access to OneDrive in some countries might be limited.

 #### Training the model

 To train the model run,

 ```
-python train.py datafile cfgfile initweightfile
+python train.py --datacfg [path_to_data_config_file] --modelcfg [path_to_model_config_file] --initweightfile [path_to_initialization_weights] --pretrain_num_epochs [number_of_epochs to pretrain]
 ```
 e.g.
 ```
-python train.py cfg/ape.data cfg/yolo-pose.cfg backup/ape/init.weights
+python train.py --datacfg cfg/ape.data --modelcfg cfg/yolo-pose.cfg --initweightfile cfg/darknet19_448.conv.23 --pretrain_num_epochs 15
 ```
+if you would like to start from ImageNet initialized weights, or  
+```
+python train.py --datacfg cfg/ape.data --modelcfg cfg/yolo-pose.cfg --initweightfile backup/duck/init.weights
+```
+if you would like to start with an already pretrained model on LINEMOD, for faster convergence.

-[datafile] contains information about the training/test splits and 3D object models
+**[datacfg]** contains information about the training/test splits, 3D object models and camera parameters

-[cfgfile] contains information about the network structure
+**[modelcfg]** contains information about the network structure

-[initweightfile] contains initialization weights. The weights "backup/[OBJECT_NAME]/init.weights" are pretrained on LINEMOD for faster convergence. We found it effective to pretrain the model without confidence estimation first and fine-tune the network later on with confidence estimation as well. "init.weights" contain the weights of these pretrained networks. However, you can also still train the network from a more crude initialization (with weights trained on ImageNet). This usually results in a slower and sometimes slightly worse convergence. You can find in cfg/ folder the file <<darknet19_448.conv.23>> that includes the network weights pretrained on ImageNet. Alternatively, you can pretrain your own weights by setting the regularization parameter for the confidence loss to 0 as explained in "Pretraining the model" section.
+**[initweightfile]** contains initialization weights.  <<darknet19_448.conv.23>> contains the network weights pretrained on ImageNet. The weights "backup/[OBJECT_NAME]/init.weights" are pretrained on LINEMOD for faster convergence. We found it effective to pretrain the model without confidence estimation first and fine-tune the network later on with confidence estimation as well. "init.weights" contain the weights of these pretrained networks. However, you can also still train the network from a more crude initialization (with weights trained on ImageNet). This usually results in a slower and sometimes slightly worse convergence. You can find in cfg/ folder the file <<darknet19_448.conv.23>> that includes the network weights pretrained on ImageNet.

 At the start of the training you will see an output like this:

@ -74,30 +81,21 @@ layer     filters    size              input                output
   31 detection
 ```

-This defines the network structure. During training, the best network model is saved into the "model.weights" file. To train networks for other objects, just change the object name while calling the train function, e.g., "python train.py cfg/duck.data cfg/yolo-pose.cfg backup/duck/init.weights" 
+This defines the network structure. During training, the best network model is saved into the "model.weights" file. To train networks for other objects, just change the object name while calling the train function, e.g., "```python train.py --datacfg cfg/duck.data --modelcfg cfg/yolo-pose.cfg --initweightfile backup/duck/init.weights```". If you come across GPU memory errors while training, you could try lowering the batch size, to for example 16 or 8, to fit into the memory. The open source version of the code has undergone strong refactoring and furthermore some models had to be retrained. The retrained models that we provide do not change much from the initial results that we provide (sometimes slight worse and sometimes slightly better).

 #### Testing the model

 To test the model run

 ```
-python valid.py datafile cfgfile weightfile
+python valid.py --datacfg [path_to_data_config_file] --modelcfg [path_to_model_config_file] --weightfile [path_to_trained_model_weights]
 ```
 e.g.
 ```
-python valid.py cfg/ape.data cfg/yolo-pose.cfg backup/ape/model_backup.weights
+python valid.py --datacfg cfg/ape.data --modelcfg cfg/yolo-pose.cfg --weightfile backup/ape/model_backup.weights
 ```

-[weightfile] contains our trained models. 
-
-#### Pretraining the model (Optional)
-
-Models are already pretrained but if you would like to pretrain the network from scratch and get the initialization weights yourself, you can run the following:
-
-python train.py cfg/ape.data cfg/yolo-pose-pre.cfg cfg/darknet19_448.conv.23
-cp backup/ape/model.weights backup/ape/init.weights
-
-During pretraining the regularization parameter for the confidence term is set to "0" in the config file "cfg/yolo-pose-pre.cfg". "darknet19_448.conv.23" includes the weights of YOLOv2 trained on ImageNet. 
+You could also use valid.ipynb to test the model and visualize the results.

 #### Multi-object pose estimation on the OCCLUSION dataset

@ -125,23 +123,30 @@ python train_multi.py cfg/occlusion.data cfg/yolo-pose-multi.cfg backup_multi/in

 #### Label files

-Our label files consist of 21 values. We predict 9 points corresponding to the centroid and corners of the 3D object model. Additionally we predict the class in each cell. That makes 9x2+1 = 19 points. In multi-object training, during training, we assign whichever anchor box has the most similar size to the current object as the responsible one to predict the 2D coordinates for that object. To encode the size of the objects, we have additional 2 numbers for the range in x dimension and y dimension. Therefore, we have 9x2+1+2 = 21 numbers. 
+Our label files consist of 21 ground-truth values. We predict 9 points corresponding to the centroid and corners of the 3D object model. Additionally we predict the class in each cell. That makes 9x2+1 = 19 points. In multi-object training, during training, we assign whichever anchor box has the most similar size to the current object as the responsible one to predict the 2D coordinates for that object. To encode the size of the objects, we have additional 2 numbers for the range in x dimension and y dimension. Therefore, we have 9x2+1+2 = 21 numbers. 
 
 Respectively, 21 numbers correspond to the following: 1st number: class label, 2nd number: x0 (x-coordinate of the centroid), 3rd number: y0 (y-coordinate of the centroid), 4th number: x1 (x-coordinate of the first corner), 5th number: y1 (y-coordinate of the first corner), ..., 18th number: x8 (x-coordinate of the eighth corner), 19th number: y8 (y-coordinate of the eighth corner), 20th number: x range, 21st number: y range.
 
-The coordinates are normalized by the image width and height: x / image_width and y / image_height. This is useful to have similar output ranges for the coordinate regression and object classification tasks.
+The coordinates are normalized by the image width and height: ```x / image_width``` and ```y / image_height```. This is useful to have similar output ranges for the coordinate regression and object classification tasks. 

-#### Training on your own dataset
+#### Tips for training on your own dataset

-To train on your own dataset, simply create the same folder structure with the provided LINEMOD dataset and adjust the paths in cfg/[OBJECT].data, [DATASET]/[OBJECT]/train.txt and [DATASET]/[OBJECT]/test.txt files. The folder for each object should contain the following: 
+We train and test our models on the LINEMOD dataset using the same train/test splits with [the BB8 method](https://arxiv.org/pdf/1703.10896.pdf) to validate our approach. If you would like to train a model on your own dataset, you could create the same folder structure with the provided LINEMOD dataset and adjust the paths in cfg/[OBJECT].data, [DATASET]/[OBJECT]/train.txt and [DATASET]/[OBJECT]/test.txt files. The folder for each object should contain the following: 

 (1) a folder containing image files,  
-(2) a folder containing label files  (please refer to [this link](https://github.com/Microsoft/singleshotpose/blob/master/label_file_creation.md) for a detailed explanation on how to create labels),  
-(3) a text file containing the filenames for training images (train.txt),  
-(4) a text file containing the filenames for test images (test.txt),  
-(5) a .ply file containing the 3D object model  
-(6) optionally, a folder containing segmentation masks (if you want to change the background of your training images to be more robust to diverse backgrounds),  
+(2) a folder containing label files  (Please refer to [this link](https://github.com/Microsoft/singleshotpose/blob/master/label_file_creation.md) for a detailed explanation on how to create labels. You could also find third-party [ObjectDatasetTools](https://github.com/F2Wang/ObjectDatasetTools) toolbox useful to create ground-truth labels for 6D object pose estimation),  
+(3) a text file containing the filenames for training images (```train.txt```),  
+(4) a text file containing the filenames for test images (```test.txt```),  
+(5) a .ply file containing the 3D object model (The unit of the object model is given in meters),  
+(6) optionally, a folder containing segmentation masks (If you want to change the background of your training images to be more robust to diverse backgrounds, this would be essential for a better generalization ability),  

+Please also make sure to adjust the following values in the data and model configuration files according to your needs:
+- You should change the "```diam```" value in the data configuration file with the diameter of the object model at hand. 
+- Depending on the size and variability of your training data, the learning rate schedule (steps, scales, max_epochs parameters in the ```yolo-pose.cfg``` file) and some data augmentation parameters (jitter, hue, saturation, exposure parameters in ```dataset.py```) might also need to be adjusted for a better convergence on your dataset. 
+- For multiple object pose estimation, you should also pre-compute anchor values using the procedure described in Section 3.2 of the paper and specify it in the model configuration file (```yolo-pose-multi.cfg```). Please also make sure to use correct number of classes and specify it in ```yolo-pose-multi.cfg```. 
+- You should further change the image size and camera parameters (```fx```, ```fy```, ```u0```, ```v0```, ```width```, ```height```) in the data configuration files with the ones specific to your dataset. 
+
+While creating a training dataset, sampling a large number of viewpoints/distances and modeling a large variability of illumination/occlusion/background settings would be important in increasing the generalization ability of the approach on your dataset. If you would like to adjust some model & loss parameters (e.g. weighthing factor for different loss terms) for your own purposes, you could do so in the model configuration file (```yolo-pose.cfg```).


 #### Acknowledgments
--- a/cfg.py
+++ b/cfg.py
@ -53,9 +53,9 @@ def print_cfg(blocks):
            kernel_size = int(block['size'])
            stride = int(block['stride'])
            is_pad = int(block['pad'])
-            pad = (kernel_size-1)/2 if is_pad else 0
-            width = (prev_width + 2*pad - kernel_size)/stride + 1
-            height = (prev_height + 2*pad - kernel_size)/stride + 1
+            pad = (kernel_size-1)//2 if is_pad else 0
+            width = (prev_width + 2*pad - kernel_size)//stride + 1
+            height = (prev_height + 2*pad - kernel_size)//stride + 1
            print('%5d %-6s %4d  %d x %d / %d   %3d x %3d x%4d   ->   %3d x %3d x%4d' % (ind, 'conv', filters, kernel_size, kernel_size, stride, prev_width, prev_height, prev_filters, width, height, filters))
            prev_width = width
            prev_height = height
@ -66,8 +66,8 @@ def print_cfg(blocks):
        elif block['type'] == 'maxpool':
            pool_size = int(block['size'])
            stride = int(block['stride'])
-            width = prev_width/stride
-            height = prev_height/stride
+            width = prev_width//stride
+            height = prev_height//stride
            print('%5d %-6s       %d x %d / %d   %3d x %3d x%4d   ->   %3d x %3d x%4d' % (ind, 'max', pool_size, pool_size, stride, prev_width, prev_height, prev_filters, width, height, filters))
            prev_width = width
            prev_height = height
@ -98,8 +98,8 @@ def print_cfg(blocks):
        elif block['type'] == 'reorg':
            stride = int(block['stride'])
            filters = stride * stride * prev_filters
-            width = prev_width/stride
-            height = prev_height/stride
+            width = prev_width//stride
+            height = prev_height//stride
            print('%5d %-6s             / %d   %3d x %3d x%4d   ->   %3d x %3d x%4d' % (ind, 'reorg', stride, prev_width, prev_height, prev_filters, width, height, filters))
            prev_width = width
            prev_height = height
@ -154,7 +154,7 @@ def load_conv(buf, start, conv_model):
    num_w = conv_model.weight.numel()
    num_b = conv_model.bias.numel()
    conv_model.bias.data.copy_(torch.from_numpy(buf[start:start+num_b]));   start = start + num_b
-    conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w])); start = start + num_w
+    conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w]).view_as(conv_model.weight.data)); start = start + num_w
    return start

 def save_conv(fp, conv_model):
@ -172,7 +172,7 @@ def load_conv_bn(buf, start, conv_model, bn_model):
    bn_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_b]));   start = start + num_b
    bn_model.running_mean.copy_(torch.from_numpy(buf[start:start+num_b]));  start = start + num_b
    bn_model.running_var.copy_(torch.from_numpy(buf[start:start+num_b]));   start = start + num_b
-    conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w])); start = start + num_w 
+    conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w]).view_as(conv_model.weight.data)); start = start + num_w 
    return start

 def save_conv_bn(fp, conv_model, bn_model):
@ -193,7 +193,7 @@ def load_fc(buf, start, fc_model):
    num_w = fc_model.weight.numel()
    num_b = fc_model.bias.numel()
    fc_model.bias.data.copy_(torch.from_numpy(buf[start:start+num_b]));     start = start + num_b
-    fc_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w]));   start = start + num_w 
+    fc_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w]).view_as(fc_model.weight.data));   start = start + num_w 
    return start

 def save_fc(fp, fc_model):
--- a/darknet.py
+++ b/darknet.py
@ -28,10 +28,10 @@ class Reorg(nn.Module):
        assert(W % stride == 0)
        ws = stride
        hs = stride
-        x = x.view(B, C, H/hs, hs, W/ws, ws).transpose(3,4).contiguous()
-        x = x.view(B, C, H/hs*W/ws, hs*ws).transpose(2,3).contiguous()
-        x = x.view(B, C, hs*ws, H/hs, W/ws).transpose(1,2).contiguous()
-        x = x.view(B, hs*ws*C, H/hs, W/ws)
+        x = x.view(B, C, H//hs, hs, W//ws, ws).transpose(3,4).contiguous()
+        x = x.view(B, C, H//hs*W//ws, hs*ws).transpose(2,3).contiguous()
+        x = x.view(B, C, hs*ws, H//hs, W//ws).transpose(1,2).contiguous()
+        x = x.view(B, hs*ws*C, H//hs, W//ws)
        return x

 class GlobalAvgPool2d(nn.Module):
@ -63,8 +63,11 @@ class Darknet(nn.Module):
        self.models = self.create_network(self.blocks) # merge conv, bn,leaky
        self.loss = self.models[len(self.models)-1]

-        self.width = int(self.blocks[0]['width'])
-        self.height = int(self.blocks[0]['height'])
+        self.width         = int(self.blocks[0]['width'])
+        self.height        = int(self.blocks[0]['height'])
+        self.test_width    = int(self.blocks[0]['test_width'])
+        self.test_height   = int(self.blocks[0]['test_height'])
+        self.num_keypoints = int(self.blocks[0]['num_keypoints'])

        if self.blocks[(len(self.blocks)-1)]['type'] == 'region':
            self.anchors = self.loss.anchors
@ -146,7 +149,7 @@ class Darknet(nn.Module):
                kernel_size = int(block['size'])
                stride = int(block['stride'])
                is_pad = int(block['pad'])
-                pad = (kernel_size-1)/2 if is_pad else 0
+                pad = (kernel_size-1)//2 if is_pad else 0
                activation = block['activation']
                model = nn.Sequential()
                if batch_normalize:
@ -233,7 +236,7 @@ class Darknet(nn.Module):
                    loss.anchors = [float(i) for i in anchors]
                loss.num_classes = int(block['classes'])
                loss.num_anchors = int(block['num'])
-                loss.anchor_step = len(loss.anchors)/loss.num_anchors
+                loss.anchor_step = len(loss.anchors)//loss.num_anchors
                loss.object_scale = float(block['object_scale'])
                loss.noobject_scale = float(block['noobject_scale'])
                loss.class_scale = float(block['class_scale'])
--- a/dataset.py
+++ b/dataset.py
@ -13,13 +13,28 @@ from utils import read_truths_args, read_truths, get_all_files

 class listDataset(Dataset):

-    def __init__(self, root, shape=None, shuffle=True, transform=None, target_transform=None, train=False, seen=0, batch_size=64, num_workers=4, bg_file_names=None):
+    def __init__(self, root, shape=None, shuffle=True, transform=None, target_transform=None, train=False, seen=0, batch_size=64, num_workers=4, cell_size=32, bg_file_names=None, num_keypoints=9, max_num_gt=50):
+
+      # root             : list of training or test images
+      # shape            : shape of the image input to the network
+      # shuffle          : whether to shuffle or not 
+      # tranform         : any pytorch-specific transformation to the input image 
+      # target_transform : any pytorch-specific tranformation to the target output
+      # train            : whether it is training data or test data
+      # seen             : the number of visited examples (iteration of the batch x batch size) # TODO: check if this is correctly assigned
+      # batch_size       : how many examples there are in the batch
+      # num_workers      : check what this is
+      # bg_file_names    : the filenames for images from which you assign random backgrounds
+
+       # read the the list of dataset images
       with open(root, 'r') as file:
           self.lines = file.readlines()

+       # Shuffle
       if shuffle:
           random.shuffle(self.lines)

+       # Initialize variables
       self.nSamples         = len(self.lines)
       self.transform        = transform
       self.target_transform = target_transform
@ -29,40 +44,53 @@ class listDataset(Dataset):
       self.batch_size       = batch_size
       self.num_workers      = num_workers
       self.bg_file_names    = bg_file_names
-
+       self.cell_size        = cell_size
+       self.nbatches         = self.nSamples // self.batch_size
+       self.num_keypoints    = num_keypoints
+       self.max_num_gt       = max_num_gt # maximum number of ground-truth labels an image can have
+    
+    # Get the number of samples in the dataset
    def __len__(self):
        return self.nSamples

+    # Get a sample from the dataset
    def __getitem__(self, index):
+
+        # Ensure the index is smallet than the number of samples in the dataset, otherwise return error
        assert index <= len(self), 'index range error'
+
+        # Get the image path
        imgpath = self.lines[index].rstrip()

-        if self.train and index % 32== 0:
-            if self.seen < 400*32:
-               width = 13*32
+        # Decide which size you are going to resize the image depending on the epoch (10, 20, etc.)
+        if self.train and index % self.batch_size== 0:
+            if self.seen < 10*self.nbatches*self.batch_size:
+               width = 13*self.cell_size
               self.shape = (width, width)
-            elif self.seen < 800*32:
-               width = (random.randint(0,7) + 13)*32
+            elif self.seen < 20*self.nbatches*self.batch_size:
+               width = (random.randint(0,7) + 13)*self.cell_size
               self.shape = (width, width)
-            elif self.seen < 1200*32:
-               width = (random.randint(0,9) + 12)*32
+            elif self.seen < 30*self.nbatches*self.batch_size:
+               width = (random.randint(0,9) + 12)*self.cell_size
               self.shape = (width, width)
-            elif self.seen < 1600*32:
-               width = (random.randint(0,11) + 11)*32
+            elif self.seen < 40*self.nbatches*self.batch_size:
+               width = (random.randint(0,11) + 11)*self.cell_size
               self.shape = (width, width)
-            elif self.seen < 2000*32:
-               width = (random.randint(0,13) + 10)*32
+            elif self.seen < 50*self.nbatches*self.batch_size:
+               width = (random.randint(0,13) + 10)*self.cell_size
               self.shape = (width, width)
-            elif self.seen < 2400*32:
-               width = (random.randint(0,15) + 9)*32
+            elif self.seen < 60*self.nbatches*self.batch_size:
+               width = (random.randint(0,15) + 9)*self.cell_size
               self.shape = (width, width)
-            elif self.seen < 3000*32:
-               width = (random.randint(0,17) + 8)*32
+            elif self.seen < 70*self.nbatches*self.batch_size:
+               width = (random.randint(0,17) + 8)*self.cell_size
               self.shape = (width, width)
-            else: # self.seen < 20000*64:
-               width = (random.randint(0,19) + 7)*32
+            else: 
+               width = (random.randint(0,19) + 7)*self.cell_size
               self.shape = (width, width)
+
        if self.train:
+            # Decide on how much data augmentation you are going to apply
            jitter = 0.2
            hue = 0.1
            saturation = 1.5 
@ -70,32 +98,44 @@ class listDataset(Dataset):

            # Get background image path
            random_bg_index = random.randint(0, len(self.bg_file_names) - 1)
-            bgpath = self.bg_file_names[random_bg_index]
+            bgpath = self.bg_file_names[random_bg_index]    

-            img, label = load_data_detection(imgpath, self.shape, jitter, hue, saturation, exposure, bgpath)
+            # Get the data augmented image and their corresponding labels
+            img, label = load_data_detection(imgpath, self.shape, jitter, hue, saturation, exposure, bgpath, self.num_keypoints, self.max_num_gt)
+
+            # Convert the labels to PyTorch variables
            label = torch.from_numpy(label)
+        
        else:
+            # Get the validation image, resize it to the network input size
            img = Image.open(imgpath).convert('RGB')
            if self.shape:
                img = img.resize(self.shape)
    
+            # Read the validation labels, allow upto 50 ground-truth objects in an image
            labpath = imgpath.replace('images', 'labels').replace('JPEGImages', 'labels').replace('.jpg', '.txt').replace('.png','.txt')
-            label = torch.zeros(50*21)
+            num_labels = 2*self.num_keypoints+3 # +2 for ground-truth of width/height , +1 for class label
+            label = torch.zeros(self.max_num_gt*num_labels)
            if os.path.getsize(labpath):
                ow, oh = img.size
-                tmp = torch.from_numpy(read_truths_args(labpath, 8.0/ow))
+                tmp = torch.from_numpy(read_truths_args(labpath))
                tmp = tmp.view(-1)
                tsz = tmp.numel()
-                if tsz > 50*21:
-                    label = tmp[0:50*21]
+                if tsz > self.max_num_gt*num_labels:
+                    label = tmp[0:self.max_num_gt*num_labels]
                elif tsz > 0:
                    label[0:tsz] = tmp

+        # Tranform the image data to PyTorch tensors
        if self.transform is not None:
            img = self.transform(img)

+        # If there is any PyTorch-specific transformation, transform the label data
        if self.target_transform is not None:
            label = self.target_transform(label)

+        # Increase the number of seen examples
        self.seen = self.seen + self.num_workers
+
+        # Return the retrieved image and its corresponding label
        return (img, label)
--- a/image.py
+++ b/image.py
@ -73,73 +73,32 @@ def data_augmentation(img, shape, jitter, hue, saturation, exposure):
    
    return img, flip, dx,dy,sx,sy 

-def fill_truth_detection(labpath, w, h, flip, dx, dy, sx, sy):
-    max_boxes = 50
-    label = np.zeros((max_boxes,21))
+def fill_truth_detection(labpath, w, h, flip, dx, dy, sx, sy, num_keypoints, max_num_gt):
+    num_labels = 2 * num_keypoints + 3
+    label = np.zeros((max_num_gt,num_labels))
    if os.path.getsize(labpath):
        bs = np.loadtxt(labpath)
        if bs is None:
            return label
-        bs = np.reshape(bs, (-1, 21))
+        bs = np.reshape(bs, (-1, num_labels))
        cc = 0
        for i in range(bs.shape[0]):
-            x0 = bs[i][1]
-            y0 = bs[i][2]
-            x1 = bs[i][3]
-            y1 = bs[i][4]
-            x2 = bs[i][5]
-            y2 = bs[i][6]
-            x3 = bs[i][7]
-            y3 = bs[i][8]
-            x4 = bs[i][9]
-            y4 = bs[i][10]
-            x5 = bs[i][11]
-            y5 = bs[i][12]
-            x6 = bs[i][13]
-            y6 = bs[i][14]
-            x7 = bs[i][15]
-            y7 = bs[i][16]
-            x8 = bs[i][17]
-            y8 = bs[i][18]
+            xs = list()
+            ys = list()
+            for j in range(num_keypoints):
+                xs.append(bs[i][2*j+1])
+                ys.append(bs[i][2*j+2])

-            x0 = min(0.999, max(0, x0 * sx - dx)) 
-            y0 = min(0.999, max(0, y0 * sy - dy)) 
-            x1 = min(0.999, max(0, x1 * sx - dx)) 
-            y1 = min(0.999, max(0, y1 * sy - dy)) 
-            x2 = min(0.999, max(0, x2 * sx - dx))
-            y2 = min(0.999, max(0, y2 * sy - dy))
-            x3 = min(0.999, max(0, x3 * sx - dx))
-            y3 = min(0.999, max(0, y3 * sy - dy))
-            x4 = min(0.999, max(0, x4 * sx - dx))
-            y4 = min(0.999, max(0, y4 * sy - dy))
-            x5 = min(0.999, max(0, x5 * sx - dx))
-            y5 = min(0.999, max(0, y5 * sy - dy))
-            x6 = min(0.999, max(0, x6 * sx - dx))
-            y6 = min(0.999, max(0, y6 * sy - dy))
-            x7 = min(0.999, max(0, x7 * sx - dx))
-            y7 = min(0.999, max(0, y7 * sy - dy))
-            x8 = min(0.999, max(0, x8 * sx - dx))
-            y8 = min(0.999, max(0, y8 * sy - dy))
-            
-            bs[i][1] = x0
-            bs[i][2] = y0
-            bs[i][3] = x1
-            bs[i][4] = y1
-            bs[i][5] = x2
-            bs[i][6] = y2
-            bs[i][7] = x3
-            bs[i][8] = y3
-            bs[i][9] = x4
-            bs[i][10] = y4
-            bs[i][11] = x5
-            bs[i][12] = y5
-            bs[i][13] = x6
-            bs[i][14] = y6
-            bs[i][15] = x7
-            bs[i][16] = y7
-            bs[i][17] = x8
-            bs[i][18] = y8
-            
+            # Make sure the centroid of the object/hand is within image
+            xs[0] = min(0.999, max(0, xs[0] * sx - dx)) 
+            ys[0] = min(0.999, max(0, ys[0] * sy - dy)) 
+            for j in range(1,num_keypoints):
+                xs[j] = xs[j] * sx - dx 
+                ys[j] = ys[j] * sy - dy 
+
+            for j in range(num_keypoints):
+                bs[i][2*j+1] = xs[j]
+                bs[i][2*j+2] = ys[j]
            label[cc] = bs[i]
            cc += 1
            if cc >= 50:
@ -167,7 +126,7 @@ def change_background(img, mask, bg):

    return out

-def load_data_detection(imgpath, shape, jitter, hue, saturation, exposure, bgpath):
+def load_data_detection(imgpath, shape, jitter, hue, saturation, exposure, bgpath, num_keypoints, max_num_gt):
    labpath = imgpath.replace('images', 'labels').replace('JPEGImages', 'labels').replace('.jpg', '.txt').replace('.png','.txt')
    maskpath = imgpath.replace('JPEGImages', 'mask').replace('/00', '/').replace('.jpg', '.png')

@ -179,6 +138,6 @@ def load_data_detection(imgpath, shape, jitter, hue, saturation, exposure, bgpat
    img = change_background(img, mask, bg)
    img,flip,dx,dy,sx,sy = data_augmentation(img, shape, jitter, hue, saturation, exposure)
    ow, oh = img.size
-    label = fill_truth_detection(labpath, ow, oh, flip, dx, dy, 1./sx, 1./sy)
+    label = fill_truth_detection(labpath, ow, oh, flip, dx, dy, 1./sx, 1./sy, num_keypoints, max_num_gt)
    return img,label

--- a/multi_obj_pose_estimation/darknet_multi.py
+++ b/multi_obj_pose_estimation/darknet_multi.py
@ -28,10 +28,10 @@ class Reorg(nn.Module):
        assert(W % stride == 0)
        ws = stride
        hs = stride
-        x = x.view(B, C, H/hs, hs, W/ws, ws).transpose(3,4).contiguous()
-        x = x.view(B, C, H/hs*W/ws, hs*ws).transpose(2,3).contiguous()
-        x = x.view(B, C, hs*ws, H/hs, W/ws).transpose(1,2).contiguous()
-        x = x.view(B, hs*ws*C, H/hs, W/ws)
+        x = x.view(B, C, H//hs, hs, W//ws, ws).transpose(3,4).contiguous()
+        x = x.view(B, C, H//hs*W//ws, hs*ws).transpose(2,3).contiguous()
+        x = x.view(B, C, hs*ws, H//hs, W//ws).transpose(1,2).contiguous()
+        x = x.view(B, hs*ws*C, H//hs, W//ws)
        return x

 class GlobalAvgPool2d(nn.Module):
@ -146,7 +146,7 @@ class Darknet(nn.Module):
                kernel_size = int(block['size'])
                stride = int(block['stride'])
                is_pad = int(block['pad'])
-                pad = (kernel_size-1)/2 if is_pad else 0
+                pad = (kernel_size-1)//2 if is_pad else 0
                activation = block['activation']
                model = nn.Sequential()
                if batch_normalize:
@ -230,7 +230,7 @@ class Darknet(nn.Module):
                loss.anchors = [float(i) for i in anchors]
                loss.num_classes = int(block['classes'])
                loss.num_anchors = int(block['num'])
-                loss.anchor_step = len(loss.anchors)/loss.num_anchors
+                loss.anchor_step = len(loss.anchors)//loss.num_anchors
                loss.object_scale = float(block['object_scale'])
                loss.noobject_scale = float(block['noobject_scale'])
                loss.class_scale = float(block['class_scale'])
--- a/multi_obj_pose_estimation/dataset_multi.py
+++ b/multi_obj_pose_estimation/dataset_multi.py
@ -8,12 +8,12 @@ import numpy as np
 from PIL import Image
 from torch.utils.data import Dataset

-from utils import read_truths_args, read_truths, get_all_files
+from utils_multi import read_truths_args, read_truths, get_all_files
 from image_multi import *

 class listDataset(Dataset):

-    def __init__(self, root, shape=None, shuffle=True, transform=None, objclass=None, target_transform=None, train=False, seen=0, batch_size=64, num_workers=4, bg_file_names=None): # bg='/cvlabdata1/home/btekin/ope/data/office_bg'
+    def __init__(self, root, shape=None, shuffle=True, transform=None, objclass=None, target_transform=None, train=False, seen=0, batch_size=64, num_workers=4, cell_size=32, bg_file_names=None, num_keypoints=9, max_num_gt=50): 
       with open(root, 'r') as file:
           self.lines = file.readlines()
       if shuffle:
@ -26,9 +26,12 @@ class listDataset(Dataset):
       self.seen             = seen
       self.batch_size       = batch_size
       self.num_workers      = num_workers
-       # self.bg_file_names    = get_all_files(bg)
       self.bg_file_names    = bg_file_names
       self.objclass         = objclass
+       self.cell_size        = cell_size
+       self.nbatches         = self.nSamples // self.batch_size
+       self.num_keypoints    = num_keypoints
+       self.max_num_gt       = max_num_gt # maximum number of ground-truth labels an image can have

    def __len__(self):
        return self.nSamples
@ -37,25 +40,25 @@ class listDataset(Dataset):
        assert index <= len(self), 'index range error'
        imgpath = self.lines[index].rstrip()

-        if self.train and index % 64== 0:
-            if self.seen < 4000*64:
-               width = 13*32
+        if self.train and index % self.batch_size == 0:
+            if self.seen < 20*self.nbatches*self.batch_size:
+               width = 13*self.cell_size
               self.shape = (width, width)
-            elif self.seen < 8000*64:
-               width = (random.randint(0,3) + 13)*32
+            elif self.seen < 40*self.nbatches*self.batch_size:
+               width = (random.randint(0,3) + 13)*self.cell_size
               self.shape = (width, width)
-            elif self.seen < 12000*64:
-               width = (random.randint(0,5) + 12)*32
+            elif self.seen < 60*self.nbatches*self.batch_size:
+               width = (random.randint(0,5) + 12)*self.cell_size
               self.shape = (width, width)
-            elif self.seen < 16000*64:
-               width = (random.randint(0,7) + 11)*32
+            elif self.seen < 80*self.nbatches*self.batch_size:
+               width = (random.randint(0,7) + 11)*self.cell_size
               self.shape = (width, width)
-            else: # self.seen < 20000*64:
-               width = (random.randint(0,9) + 10)*32
+            else: 
+               width = (random.randint(0,9) + 10)*self.cell_size
               self.shape = (width, width)

        if self.train:
-            # jitter = 0.2
+            # Decide on how much data augmentation you are going to apply
            jitter = 0.1
            hue = 0.05
            saturation = 1.5 
@ -65,7 +68,7 @@ class listDataset(Dataset):
            random_bg_index = random.randint(0, len(self.bg_file_names) - 1)
            bgpath = self.bg_file_names[random_bg_index]

-            img, label = load_data_detection(imgpath, self.shape, jitter, hue, saturation, exposure, bgpath)
+            img, label = load_data_detection(imgpath, self.shape, jitter, hue, saturation, exposure, bgpath, self.num_keypoints, self.max_num_gt)
            label = torch.from_numpy(label)
        else:
            img = Image.open(imgpath).convert('RGB')
@ -73,14 +76,15 @@ class listDataset(Dataset):
                img = img.resize(self.shape)
            
            labpath = imgpath.replace('benchvise', self.objclass).replace('images', 'labels_occlusion').replace('JPEGImages', 'labels_occlusion').replace('.jpg', '.txt').replace('.png','.txt')
-            label = torch.zeros(50*21)
+            num_labels = 2*self.num_keypoints+3 # +2 for ground-truth of width/height , +1 for class label
+            label = torch.zeros(self.max_num_gt*num_labels)
            if os.path.getsize(labpath):
                ow, oh = img.size
-                tmp = torch.from_numpy(read_truths_args(labpath, 8.0/ow))
+                tmp = torch.from_numpy(read_truths_args(labpath))
                tmp = tmp.view(-1)
                tsz = tmp.numel()
-                if tsz > 50*21:
-                    label = tmp[0:50*21]
+                if tsz > self.max_num_gt*num_labels:
+                    label = tmp[0:self.max_num_gt*num_labels]
                elif tsz > 0:
                    label[0:tsz] = tmp

--- a/multi_obj_pose_estimation/image_multi.py
+++ b/multi_obj_pose_estimation/image_multi.py
@ -5,20 +5,6 @@ import os
 from PIL import Image, ImageChops, ImageMath
 import numpy as np

-def load_data_detection_backup(imgpath, shape, jitter, hue, saturation, exposure, bgpath):
-    labpath = imgpath.replace('images', 'labels').replace('JPEGImages', 'labels').replace('.jpg', '.txt').replace('.png','.txt')
-    maskpath = imgpath.replace('JPEGImages', 'mask').replace('/00', '/').replace('.jpg', '.png')
-
-    ## data augmentation
-    img = Image.open(imgpath).convert('RGB')
-    mask = Image.open(maskpath).convert('RGB')
-    bg = Image.open(bgpath).convert('RGB')
-    
-    img = change_background(img, mask, bg)
-    img,flip,dx,dy,sx,sy = data_augmentation(img, shape, jitter, hue, saturation, exposure)
-    label = fill_truth_detection(labpath, img.width, img.height, flip, dx, dy, 1./sx, 1./sy)
-    return img,label
-
 def get_add_objs(objname):
    # Decide how many additional objects you will augment and what will be the other types of objects
    if objname == 'ape':
@ -87,7 +73,6 @@ def distort_image(im, hue, sat, val):
    im = Image.merge(im.mode, tuple(cs))

    im = im.convert('RGB')
-    #constrain_image(im)
    return im

 def rand_scale(s):
@ -135,98 +120,45 @@ def data_augmentation(img, shape, jitter, hue, saturation, exposure):
    
    return img, flip, dx,dy,sx,sy 

-def fill_truth_detection(labpath, w, h, flip, dx, dy, sx, sy):
-    max_boxes = 50
-    label = np.zeros((max_boxes,21))
+def fill_truth_detection(labpath, w, h, flip, dx, dy, sx, sy, num_keypoints, max_num_gt):
+    
+    num_labels = 2*num_keypoints+3 # +2 for width, height, +1 for class label
+    label = np.zeros((max_num_gt,num_labels))
    if os.path.getsize(labpath):
        
        bs = np.loadtxt(labpath)
        if bs is None:
            return label
-        bs = np.reshape(bs, (-1, 21))
+        bs = np.reshape(bs, (-1, num_labels))
        cc = 0
        for i in range(bs.shape[0]):
-            x0 = bs[i][1]
-            y0 = bs[i][2]
-            x1 = bs[i][3]
-            y1 = bs[i][4]
-            x2 = bs[i][5]
-            y2 = bs[i][6]
-            x3 = bs[i][7]
-            y3 = bs[i][8]
-            x4 = bs[i][9]
-            y4 = bs[i][10]
-            x5 = bs[i][11]
-            y5 = bs[i][12]
-            x6 = bs[i][13]
-            y6 = bs[i][14]
-            x7 = bs[i][15]
-            y7 = bs[i][16]
-            x8 = bs[i][17]
-            y8 = bs[i][18]
+            xs = list()
+            ys = list()
+            for j in range(num_keypoints):
+                xs.append(bs[i][2*j+1])
+                ys.append(bs[i][2*j+2])

-            x0 = min(0.999, max(0, x0 * sx - dx)) 
-            y0 = min(0.999, max(0, y0 * sy - dy)) 
-            x1 = min(0.999, max(0, x1 * sx - dx)) 
-            y1 = min(0.999, max(0, y1 * sy - dy)) 
-            x2 = min(0.999, max(0, x2 * sx - dx))
-            y2 = min(0.999, max(0, y2 * sy - dy))
-            x3 = min(0.999, max(0, x3 * sx - dx))
-            y3 = min(0.999, max(0, y3 * sy - dy))
-            x4 = min(0.999, max(0, x4 * sx - dx))
-            y4 = min(0.999, max(0, y4 * sy - dy))
-            x5 = min(0.999, max(0, x5 * sx - dx))
-            y5 = min(0.999, max(0, y5 * sy - dy))
-            x6 = min(0.999, max(0, x6 * sx - dx))
-            y6 = min(0.999, max(0, y6 * sy - dy))
-            x7 = min(0.999, max(0, x7 * sx - dx))
-            y7 = min(0.999, max(0, y7 * sy - dy))
-            x8 = min(0.999, max(0, x8 * sx - dx))
-            y8 = min(0.999, max(0, y8 * sy - dy))
+            # Make sure the centroid of the object/hand is within image
+            xs[0] = min(0.999, max(0, xs[0] * sx - dx)) 
+            ys[0] = min(0.999, max(0, ys[0] * sy - dy)) 
+            for j in range(1,num_keypoints):
+                xs[j] = xs[j] * sx - dx 
+                ys[j] = ys[j] * sy - dy 
            
-            bs[i][0] = bs[i][0]
-            bs[i][1] = x0
-            bs[i][2] = y0
-            bs[i][3] = x1
-            bs[i][4] = y1
-            bs[i][5] = x2
-            bs[i][6] = y2
-            bs[i][7] = x3
-            bs[i][8] = y3
-            bs[i][9] = x4
-            bs[i][10] = y4
-            bs[i][11] = x5
-            bs[i][12] = y5
-            bs[i][13] = x6
-            bs[i][14] = y6
-            bs[i][15] = x7
-            bs[i][16] = y7
-            bs[i][17] = x8
-            bs[i][18] = y8
+            for j in range(num_keypoints):
+                bs[i][2*j+1] = xs[j]
+                bs[i][2*j+2] = ys[j]

-            xs = [x1, x2, x3, x4, x5, x6, x7, x8]
-            ys = [y1, y2, y3, y4, y5, y6, y7, y8]
            min_x = min(xs);
            max_x = max(xs);
            min_y = min(ys);
            max_y = max(ys);
-            bs[i][19] = max_x - min_x;
-            bs[i][20] = max_y - min_y;
-
-            if flip:
-                bs[i][1] =  0.999 - bs[i][1] 
-                bs[i][3] =  0.999 - bs[i][3]
-                bs[i][5] =  0.999 - bs[i][5]
-                bs[i][7] =  0.999 - bs[i][7]
-                bs[i][9] =  0.999 - bs[i][9]
-                bs[i][11] =  0.999 - bs[i][11]
-                bs[i][13] =  0.999 - bs[i][13]
-                bs[i][15] =  0.999 - bs[i][15]
-                bs[i][17] =  0.999 - bs[i][17]
+            bs[i][2*num_keypoints+1] = max_x - min_x;
+            bs[i][2*num_keypoints+2] = max_y - min_y;
            
            label[cc] = bs[i]
            cc += 1
-            if cc >= 50:
+            if cc >= max_num_gt:
                break

    label = np.reshape(label, (-1))
@ -364,10 +296,10 @@ def superimpose_masks(mask, total_mask):

    return out

-def augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation, exposure):
+def augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation, exposure, num_keypoints, max_num_gt):

    pixelThreshold = 200
-    
+    num_labels = 2*num_keypoints+3
    random.shuffle(add_objs)
    labpath = imgpath.replace('images', 'labels').replace('JPEGImages', 'labels').replace('.jpg', '.txt').replace('.png','.txt')
    maskpath = imgpath.replace('JPEGImages', 'mask').replace('/00', '/').replace('.jpg', '.png')
@ -377,8 +309,8 @@ def augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation,
    iw, ih = img.size
    mask = Image.open(maskpath).convert('RGB')
    img,mask,flip,dx,dy,sx,sy = shifted_data_augmentation_with_mask(img, mask, shape, jitter, hue, saturation, exposure)
-    label = fill_truth_detection(labpath, iw, ih, flip, dx, dy, 1./sx, 1./sy)
-    total_label = np.reshape(label, (-1, 21))  
+    label = fill_truth_detection(labpath, iw, ih, flip, dx, dy, 1./sx, 1./sy, num_keypoints, max_num_gt)
+    total_label = np.reshape(label, (-1, num_labels))  

    # Mask the background
    masked_img = mask_background(img, mask)
@ -406,7 +338,7 @@ def augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation,
            obj_rand_masked_img = mask_background(obj_rand_img, obj_rand_mask)

            obj_rand_masked_img,obj_rand_mask,flip,dx,dy,sx,sy = data_augmentation_with_mask(obj_rand_masked_img, obj_rand_mask, shape, jitter, hue, saturation, exposure)
-            obj_rand_label = fill_truth_detection(obj_rand_lab_path, iw, ih, flip, dx, dy, 1./sx, 1./sy)
+            obj_rand_label = fill_truth_detection(obj_rand_lab_path, iw, ih, flip, dx, dy, 1./sx, 1./sy, num_keypoints, max_num_gt)
            
            # compute intersection (ratio of the object part intersecting with other object parts over the area of the object)
            xx = np.array(obj_rand_mask)
@ -422,7 +354,7 @@ def augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation,
                successful = True
                total_mask = superimpose_masks(obj_rand_mask, total_mask) #  total_mask + obj_rand_mask
                total_masked_img = superimpose_masked_imgs(obj_rand_masked_img, obj_rand_mask, total_masked_img) # total_masked_img + obj_rand_masked_img
-                obj_rand_label = np.reshape(obj_rand_label, (-1, 21))
+                obj_rand_label = np.reshape(obj_rand_label, (-1, num_labels))
                total_label[count, :] = obj_rand_label[0, :] 
                count = count + 1
            else:
@ -432,7 +364,7 @@ def augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation,

    return total_masked_img, np.reshape(total_label, (-1)), total_mask

-def load_data_detection(imgpath, shape, jitter, hue, saturation, exposure, bgpath):
+def load_data_detection(imgpath, shape, jitter, hue, saturation, exposure, bgpath, num_keypoints, max_num_gt):
    
    # Read the background image
    bg = Image.open(bgpath).convert('RGB')
@ -441,10 +373,11 @@ def load_data_detection(imgpath, shape, jitter, hue, saturation, exposure, bgpat
    dirname = os.path.dirname(os.path.dirname(imgpath)) ## dir of dir of file
    objname = os.path.basename(dirname)
    add_objs = get_add_objs(objname)
+    num_labels = 2*num_keypoints+3
    
    # Add additional objects in the scene, apply data augmentation on the objects
-    total_masked_img, label, total_mask = augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation, exposure)
+    total_masked_img, label, total_mask = augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation, exposure, num_keypoints, max_num_gt)
    img = change_background(total_masked_img, total_mask, bg)
-    lb = np.reshape(label, (-1, 21))
+    lb = np.reshape(label, (-1, num_labels))
    return img,label

--- a/multi_obj_pose_estimation/region_loss_multi.py
+++ b/multi_obj_pose_estimation/region_loss_multi.py
@ -4,178 +4,110 @@ import math
 import torch.nn as nn
 import torch.nn.functional as F
 from torch.autograd import Variable
-from utils import *
+from utils_multi import *

-def build_targets(pred_corners, target, anchors, num_anchors, num_classes, nH, nW, noobject_scale, object_scale, sil_thresh, seen):
+def build_targets(pred_corners, target, num_keypoints, anchors, num_anchors, num_classes, nH, nW, noobject_scale, object_scale, sil_thresh, seen):
    nB = target.size(0)
    nA = num_anchors
    nC = num_classes
-    anchor_step = len(anchors)/num_anchors
+    anchor_step = len(anchors)//num_anchors
    conf_mask   = torch.ones(nB, nA, nH, nW) * noobject_scale
    coord_mask  = torch.zeros(nB, nA, nH, nW)
    cls_mask    = torch.zeros(nB, nA, nH, nW)
-    tx0         = torch.zeros(nB, nA, nH, nW) 
-    ty0         = torch.zeros(nB, nA, nH, nW) 
-    tx1         = torch.zeros(nB, nA, nH, nW) 
-    ty1         = torch.zeros(nB, nA, nH, nW) 
-    tx2         = torch.zeros(nB, nA, nH, nW) 
-    ty2         = torch.zeros(nB, nA, nH, nW) 
-    tx3         = torch.zeros(nB, nA, nH, nW) 
-    ty3         = torch.zeros(nB, nA, nH, nW) 
-    tx4         = torch.zeros(nB, nA, nH, nW) 
-    ty4         = torch.zeros(nB, nA, nH, nW) 
-    tx5         = torch.zeros(nB, nA, nH, nW) 
-    ty5         = torch.zeros(nB, nA, nH, nW) 
-    tx6         = torch.zeros(nB, nA, nH, nW) 
-    ty6         = torch.zeros(nB, nA, nH, nW) 
-    tx7         = torch.zeros(nB, nA, nH, nW) 
-    ty7         = torch.zeros(nB, nA, nH, nW) 
-    tx8         = torch.zeros(nB, nA, nH, nW) 
-    ty8         = torch.zeros(nB, nA, nH, nW) 
+    txs = list()
+    tys = list()
+    for i in range(num_keypoints):
+        txs.append(torch.zeros(nB, nA, nH, nW))
+        tys.append(torch.zeros(nB, nA, nH, nW)) 
    tconf       = torch.zeros(nB, nA, nH, nW)
    tcls        = torch.zeros(nB, nA, nH, nW) 

+    num_labels = 2 * num_keypoints + 3 # +2 for width, height and +1 for class within label files
    nAnchors = nA*nH*nW
    nPixels  = nH*nW
-    for b in xrange(nB):
+    for b in range(nB):
        cur_pred_corners = pred_corners[b*nAnchors:(b+1)*nAnchors].t()
        cur_confs = torch.zeros(nAnchors)
-        for t in xrange(50):
-            if target[b][t*21+1] == 0:
+        for t in range(50):
+            if target[b][t*num_labels+1] == 0:
                break
-            gx0 = target[b][t*21+1]*nW
-            gy0 = target[b][t*21+2]*nH
-            gx1 = target[b][t*21+3]*nW
-            gy1 = target[b][t*21+4]*nH
-            gx2 = target[b][t*21+5]*nW
-            gy2 = target[b][t*21+6]*nH
-            gx3 = target[b][t*21+7]*nW
-            gy3 = target[b][t*21+8]*nH
-            gx4 = target[b][t*21+9]*nW
-            gy4 = target[b][t*21+10]*nH
-            gx5 = target[b][t*21+11]*nW
-            gy5 = target[b][t*21+12]*nH
-            gx6 = target[b][t*21+13]*nW
-            gy6 = target[b][t*21+14]*nH
-            gx7 = target[b][t*21+15]*nW
-            gy7 = target[b][t*21+16]*nH
-            gx8 = target[b][t*21+17]*nW
-            gy8 = target[b][t*21+18]*nH
+            g = list()
+            for i in range(num_keypoints):
+                g.append(target[b][t*num_labels+2*i+1])
+                g.append(target[b][t*num_labels+2*i+2])

-            cur_gt_corners = torch.FloatTensor([gx0/nW,gy0/nH,gx1/nW,gy1/nH,gx2/nW,gy2/nH,gx3/nW,gy3/nH,gx4/nW,gy4/nH,gx5/nW,gy5/nH,gx6/nW,gy6/nH,gx7/nW,gy7/nH,gx8/nW,gy8/nH]).repeat(nAnchors,1).t() # 16 x nAnchors
-            cur_confs  = torch.max(cur_confs, corner_confidences9(cur_pred_corners, cur_gt_corners)) # some irrelevant areas are filtered, in the same grid multiple anchor boxes might exceed the threshold
+            cur_gt_corners = torch.FloatTensor(g).repeat(nAnchors,1).t() # 18 x nAnchors
+            cur_confs  = torch.max(cur_confs.view_as(conf_mask[b]), corner_confidences(cur_pred_corners, cur_gt_corners).view_as(conf_mask[b])) # some irrelevant areas are filtered, in the same grid multiple anchor boxes might exceed the threshold
        conf_mask[b][cur_confs>sil_thresh] = 0
-    if seen < -1:#6400:
-       tx0.fill_(0.5)
-       ty0.fill_(0.5)
-       tx1.fill_(0.5)
-       ty1.fill_(0.5)
-       tx2.fill_(0.5)
-       ty2.fill_(0.5)
-       tx3.fill_(0.5)
-       ty3.fill_(0.5)
-       tx4.fill_(0.5)
-       ty4.fill_(0.5)
-       tx5.fill_(0.5)
-       ty5.fill_(0.5)
-       tx6.fill_(0.5)
-       ty6.fill_(0.5)
-       tx7.fill_(0.5)
-       ty7.fill_(0.5)
-       tx8.fill_(0.5)
-       ty8.fill_(0.5)
-       coord_mask.fill_(1)

    nGT = 0
    nCorrect = 0
-    for b in xrange(nB):
-        for t in xrange(50):
-            if target[b][t*21+1] == 0:
+    for b in range(nB):
+        for t in range(50):
+            if target[b][t*num_labels+1] == 0:
                break
            nGT = nGT + 1
            best_iou = 0.0
            best_n = -1
-            min_dist = 10000
-            gx0 = target[b][t*21+1] * nW
-            gy0 = target[b][t*21+2] * nH
-            gi0 = int(gx0)
-            gj0 = int(gy0)
-            gx1 = target[b][t*21+3] * nW
-            gy1 = target[b][t*21+4] * nH
-            gx2 = target[b][t*21+5] * nW
-            gy2 = target[b][t*21+6] * nH
-            gx3 = target[b][t*21+7] * nW
-            gy3 = target[b][t*21+8] * nH
-            gx4 = target[b][t*21+9] * nW
-            gy4 = target[b][t*21+10] * nH
-            gx5 = target[b][t*21+11] * nW
-            gy5 = target[b][t*21+12] * nH
-            gx6 = target[b][t*21+13] * nW
-            gy6 = target[b][t*21+14] * nH
-            gx7 = target[b][t*21+15] * nW
-            gy7 = target[b][t*21+16] * nH
-            gx8 = target[b][t*21+17] * nW
-            gy8 = target[b][t*21+18] * nH
+            min_dist = sys.maxsize
+            gx = list()
+            gy = list()
+            gt_box = list()
+            for i in range(num_keypoints):
+                gt_box.extend([target[b][t*num_labels+2*i+1], target[b][t*num_labels+2*i+2]])
+                gx.append(target[b][t*num_labels+2*i+1] * nW)
+                gy.append(target[b][t*num_labels+2*i+2] * nH)
+                if i == 0:
+                    gi0  = int(gx[i])
+                    gj0  = int(gy[i])
+            pred_box = pred_corners[b*nAnchors+best_n*nPixels+gj0*nW+gi0]
+            conf = corner_confidence(gt_box, pred_box) 

-            gw  = target[b][t*21+19]*nW
-            gh  = target[b][t*21+20]*nH
-            gt_box = [0, 0, gw, gh]
-            for n in xrange(nA):
+            # Decide which anchor to use during prediction
+            gw  = target[b][t*num_labels+num_labels-2]*nW
+            gh  = target[b][t*num_labels+num_labels-1]*nH
+            gt_2d_box = [0, 0, gw, gh]
+            for n in range(nA):
                aw = anchors[anchor_step*n]
                ah = anchors[anchor_step*n+1]
                anchor_box = [0, 0, aw, ah]
-                iou  = bbox_iou(anchor_box, gt_box, x1y1x2y2=False)
+                iou  = bbox_iou(anchor_box, gt_2d_box, x1y1x2y2=False)
                if iou > best_iou:
                    best_iou = iou
                    best_n = n

-            gt_box = [gx0/nW,gy0/nH,gx1/nW,gy1/nH,gx2/nW,gy2/nH,gx3/nW,gy3/nH,gx4/nW,gy4/nH,gx5/nW,gy5/nH,gx6/nW,gy6/nH,gx7/nW,gy7/nH,gx8/nW,gy8/nH]
-            pred_box = pred_corners[b*nAnchors+best_n*nPixels+gj0*nW+gi0]
-            conf = corner_confidence9(gt_box, pred_box) 
            coord_mask[b][best_n][gj0][gi0] = 1
            cls_mask[b][best_n][gj0][gi0]   = 1
            conf_mask[b][best_n][gj0][gi0]  = object_scale
-            tx0[b][best_n][gj0][gi0]        = target[b][t*21+1] * nW - gi0
-            ty0[b][best_n][gj0][gi0]        = target[b][t*21+2] * nH - gj0
-            tx1[b][best_n][gj0][gi0]        = target[b][t*21+3] * nW - gi0
-            ty1[b][best_n][gj0][gi0]        = target[b][t*21+4] * nH - gj0
-            tx2[b][best_n][gj0][gi0]        = target[b][t*21+5] * nW - gi0
-            ty2[b][best_n][gj0][gi0]        = target[b][t*21+6] * nH - gj0
-            tx3[b][best_n][gj0][gi0]        = target[b][t*21+7] * nW - gi0
-            ty3[b][best_n][gj0][gi0]        = target[b][t*21+8] * nH - gj0
-            tx4[b][best_n][gj0][gi0]        = target[b][t*21+9] * nW - gi0
-            ty4[b][best_n][gj0][gi0]        = target[b][t*21+10] * nH - gj0
-            tx5[b][best_n][gj0][gi0]        = target[b][t*21+11] * nW - gi0
-            ty5[b][best_n][gj0][gi0]        = target[b][t*21+12] * nH - gj0
-            tx6[b][best_n][gj0][gi0]        = target[b][t*21+13] * nW - gi0
-            ty6[b][best_n][gj0][gi0]        = target[b][t*21+14] * nH - gj0
-            tx7[b][best_n][gj0][gi0]        = target[b][t*21+15] * nW - gi0
-            ty7[b][best_n][gj0][gi0]        = target[b][t*21+16] * nH - gj0
-            tx8[b][best_n][gj0][gi0]        = target[b][t*21+17] * nW - gi0
-            ty8[b][best_n][gj0][gi0]        = target[b][t*21+18] * nH - gj0
+            # Update targets
+            for i in range(num_keypoints):
+                txs[i][b][best_n][gj0][gi0] = gx[i]- gi0
+                tys[i][b][best_n][gj0][gi0] = gy[i]- gj0   
            tconf[b][best_n][gj0][gi0]      = conf
-            tcls[b][best_n][gj0][gi0]       = target[b][t*21]
+            tcls[b][best_n][gj0][gi0]       = target[b][t*num_labels]

            if conf > 0.5:
                nCorrect = nCorrect + 1

-    return nGT, nCorrect, coord_mask, conf_mask, cls_mask, tx0, tx1, tx2, tx3, tx4, tx5, tx6, tx7, tx8, ty0, ty1, ty2, ty3, ty4, ty5, ty6, ty7, ty8, tconf, tcls
+    return nGT, nCorrect, coord_mask, conf_mask, cls_mask, txs, tys, tconf, tcls
           
 class RegionLoss(nn.Module):
-    def __init__(self, num_classes=0, anchors=[], num_anchors=5):
+    def __init__(self, num_keypoints=9, num_classes=13, anchors=[], num_anchors=5, pretrain_num_epochs=15):
        super(RegionLoss, self).__init__()
        self.num_classes = num_classes
        self.anchors = anchors
        self.num_anchors = num_anchors
        self.anchor_step = len(anchors)/num_anchors
+        self.num_keypoints = num_keypoints
        self.coord_scale = 1
        self.noobject_scale = 1
        self.object_scale = 5
        self.class_scale = 1
        self.thresh = 0.6
        self.seen = 0
+        self.pretrain_num_epochs = pretrain_num_epochs

-    def forward(self, output, target):
+    def forward(self, output, target, epoch):
        # Parameters
        t0 = time.time()
        nB = output.data.size(0)
@ -185,81 +117,40 @@ class RegionLoss(nn.Module):
        nW = output.data.size(3)

        # Activation
-        output = output.view(nB, nA, (19+nC), nH, nW)
-        x0     = F.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([0]))).view(nB, nA, nH, nW))
-        y0     = F.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([1]))).view(nB, nA, nH, nW))
-        x1     = output.index_select(2, Variable(torch.cuda.LongTensor([2]))).view(nB, nA, nH, nW)
-        y1     = output.index_select(2, Variable(torch.cuda.LongTensor([3]))).view(nB, nA, nH, nW)
-        x2     = output.index_select(2, Variable(torch.cuda.LongTensor([4]))).view(nB, nA, nH, nW)
-        y2     = output.index_select(2, Variable(torch.cuda.LongTensor([5]))).view(nB, nA, nH, nW)
-        x3     = output.index_select(2, Variable(torch.cuda.LongTensor([6]))).view(nB, nA, nH, nW)
-        y3     = output.index_select(2, Variable(torch.cuda.LongTensor([7]))).view(nB, nA, nH, nW)
-        x4     = output.index_select(2, Variable(torch.cuda.LongTensor([8]))).view(nB, nA, nH, nW)
-        y4     = output.index_select(2, Variable(torch.cuda.LongTensor([9]))).view(nB, nA, nH, nW)
-        x5     = output.index_select(2, Variable(torch.cuda.LongTensor([10]))).view(nB, nA, nH, nW)
-        y5     = output.index_select(2, Variable(torch.cuda.LongTensor([11]))).view(nB, nA, nH, nW)
-        x6     = output.index_select(2, Variable(torch.cuda.LongTensor([12]))).view(nB, nA, nH, nW)
-        y6     = output.index_select(2, Variable(torch.cuda.LongTensor([13]))).view(nB, nA, nH, nW)
-        x7     = output.index_select(2, Variable(torch.cuda.LongTensor([14]))).view(nB, nA, nH, nW)
-        y7     = output.index_select(2, Variable(torch.cuda.LongTensor([15]))).view(nB, nA, nH, nW)
-        x8     = output.index_select(2, Variable(torch.cuda.LongTensor([16]))).view(nB, nA, nH, nW)
-        y8     = output.index_select(2, Variable(torch.cuda.LongTensor([17]))).view(nB, nA, nH, nW)
-        conf   = F.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([18]))).view(nB, nA, nH, nW))
-        cls    = output.index_select(2, Variable(torch.linspace(19,19+nC-1,nC).long().cuda()))
+        output = output.view(nB, nA, (2*self.num_keypoints+1+nC), nH, nW)
+        x = list()
+        y = list()
+        x.append(torch.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([0]))).view(nB, nA, nH, nW)))
+        y.append(torch.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([1]))).view(nB, nA, nH, nW)))
+        for i in range(1,self.num_keypoints):
+            x.append(output.index_select(2, Variable(torch.cuda.LongTensor([2 * i + 0]))).view(nB, nA, nH, nW))
+            y.append(output.index_select(2, Variable(torch.cuda.LongTensor([2 * i + 1]))).view(nB, nA, nH, nW))
+        conf   = F.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([2*self.num_keypoints]))).view(nB, nA, nH, nW))
+        cls    = output.index_select(2, Variable(torch.linspace(2*self.num_keypoints+1,2*self.num_keypoints+1+nC-1,nC).long().cuda()))
        cls    = cls.view(nB*nA, nC, nH*nW).transpose(1,2).contiguous().view(nB*nA*nH*nW, nC)
        t1     = time.time()

        # Create pred boxes
-        pred_corners = torch.cuda.FloatTensor(18, nB*nA*nH*nW)
+        pred_corners = torch.cuda.FloatTensor(2*self.num_keypoints, nB*nA*nH*nW)
        grid_x = torch.linspace(0, nW-1, nW).repeat(nH,1).repeat(nB*nA, 1, 1).view(nB*nA*nH*nW).cuda()
        grid_y = torch.linspace(0, nH-1, nH).repeat(nW,1).t().repeat(nB*nA, 1, 1).view(nB*nA*nH*nW).cuda()
-        pred_corners[0]  = (x0.data + grid_x) / nW
-        pred_corners[1]  = (y0.data + grid_y) / nH
-        pred_corners[2]  = (x1.data + grid_x) / nW
-        pred_corners[3]  = (y1.data + grid_y) / nH
-        pred_corners[4]  = (x2.data + grid_x) / nW
-        pred_corners[5]  = (y2.data + grid_y) / nH
-        pred_corners[6]  = (x3.data + grid_x) / nW
-        pred_corners[7]  = (y3.data + grid_y) / nH
-        pred_corners[8]  = (x4.data + grid_x) / nW
-        pred_corners[9]  = (y4.data + grid_y) / nH
-        pred_corners[10] = (x5.data + grid_x) / nW
-        pred_corners[11] = (y5.data + grid_y) / nH
-        pred_corners[12] = (x6.data + grid_x) / nW
-        pred_corners[13] = (y6.data + grid_y) / nH
-        pred_corners[14] = (x7.data + grid_x) / nW
-        pred_corners[15] = (y7.data + grid_y) / nH
-        pred_corners[16] = (x8.data + grid_x) / nW
-        pred_corners[17] = (y8.data + grid_y) / nH
-        gpu_matrix = pred_corners.transpose(0,1).contiguous().view(-1,18)
+        for i in range(self.num_keypoints):
+            pred_corners[2 * i + 0]  = (x[i].data.view_as(grid_x) + grid_x) / nW
+            pred_corners[2 * i + 1]  = (y[i].data.view_as(grid_y) + grid_y) / nH
+        gpu_matrix = pred_corners.transpose(0,1).contiguous().view(-1,2*self.num_keypoints)
        pred_corners = convert2cpu(gpu_matrix)
        t2 = time.time()

        # Build targets
-        nGT, nCorrect, coord_mask, conf_mask, cls_mask, tx0, tx1, tx2, tx3, tx4, tx5, tx6, tx7, tx8, ty0, ty1, ty2, ty3, ty4, ty5, ty6, ty7, ty8, tconf, tcls = \
-                       build_targets(pred_corners, target.data, self.anchors, nA, nC, nH, nW, self.noobject_scale, self.object_scale, self.thresh, self.seen)
+        nGT, nCorrect, coord_mask, conf_mask, cls_mask, txs, tys, tconf, tcls = \
+                       build_targets(pred_corners, target.data, self.num_keypoints, self.anchors, nA, nC, nH, nW, self.noobject_scale, self.object_scale, self.thresh, self.seen)
        cls_mask   = (cls_mask == 1)
        nProposals = int((conf > 0.25).sum().data[0])
-        tx0        = Variable(tx0.cuda())
-        ty0        = Variable(ty0.cuda())
-        tx1        = Variable(tx1.cuda())
-        ty1        = Variable(ty1.cuda())
-        tx2        = Variable(tx2.cuda())
-        ty2        = Variable(ty2.cuda())
-        tx3        = Variable(tx3.cuda())
-        ty3        = Variable(ty3.cuda())
-        tx4        = Variable(tx4.cuda())
-        ty4        = Variable(ty4.cuda())
-        tx5        = Variable(tx5.cuda())
-        ty5        = Variable(ty5.cuda())
-        tx6        = Variable(tx6.cuda())
-        ty6        = Variable(ty6.cuda())
-        tx7        = Variable(tx7.cuda())
-        ty7        = Variable(ty7.cuda())
-        tx8        = Variable(tx8.cuda())
-        ty8        = Variable(ty8.cuda())
+        for i in range(self.num_keypoints):
+            txs[i] = Variable(txs[i].cuda())
+            tys[i] = Variable(tys[i].cuda())
        tconf      = Variable(tconf.cuda())
-        tcls       = Variable(tcls.view(-1)[cls_mask].long().cuda())
+        tcls       = Variable(tcls[cls_mask].long().cuda())
        coord_mask = Variable(coord_mask.cuda())
        conf_mask  = Variable(conf_mask.cuda().sqrt())
        cls_mask   = Variable(cls_mask.view(-1, 1).repeat(1,nC).cuda())
@ -267,35 +158,24 @@ class RegionLoss(nn.Module):
        t3 = time.time()

        # Create loss
-        loss_x0    = self.coord_scale * nn.MSELoss(size_average=False)(x0*coord_mask, tx0*coord_mask)/2.0
-        loss_y0    = self.coord_scale * nn.MSELoss(size_average=False)(y0*coord_mask, ty0*coord_mask)/2.0
-        loss_x1    = self.coord_scale * nn.MSELoss(size_average=False)(x1*coord_mask, tx1*coord_mask)/2.0
-        loss_y1    = self.coord_scale * nn.MSELoss(size_average=False)(y1*coord_mask, ty1*coord_mask)/2.0
-        loss_x2    = self.coord_scale * nn.MSELoss(size_average=False)(x2*coord_mask, tx2*coord_mask)/2.0
-        loss_y2    = self.coord_scale * nn.MSELoss(size_average=False)(y2*coord_mask, ty2*coord_mask)/2.0
-        loss_x3    = self.coord_scale * nn.MSELoss(size_average=False)(x3*coord_mask, tx3*coord_mask)/2.0
-        loss_y3    = self.coord_scale * nn.MSELoss(size_average=False)(y3*coord_mask, ty3*coord_mask)/2.0
-        loss_x4    = self.coord_scale * nn.MSELoss(size_average=False)(x4*coord_mask, tx4*coord_mask)/2.0
-        loss_y4    = self.coord_scale * nn.MSELoss(size_average=False)(y4*coord_mask, ty4*coord_mask)/2.0
-        loss_x5    = self.coord_scale * nn.MSELoss(size_average=False)(x5*coord_mask, tx5*coord_mask)/2.0
-        loss_y5    = self.coord_scale * nn.MSELoss(size_average=False)(y5*coord_mask, ty5*coord_mask)/2.0
-        loss_x6    = self.coord_scale * nn.MSELoss(size_average=False)(x6*coord_mask, tx6*coord_mask)/2.0
-        loss_y6    = self.coord_scale * nn.MSELoss(size_average=False)(y6*coord_mask, ty6*coord_mask)/2.0
-        loss_x7    = self.coord_scale * nn.MSELoss(size_average=False)(x7*coord_mask, tx7*coord_mask)/2.0
-        loss_y7    = self.coord_scale * nn.MSELoss(size_average=False)(y7*coord_mask, ty7*coord_mask)/2.0
-        loss_x8    = self.coord_scale * nn.MSELoss(size_average=False)(x8*coord_mask, tx8*coord_mask)/2.0
-        loss_y8    = self.coord_scale * nn.MSELoss(size_average=False)(y8*coord_mask, ty8*coord_mask)/2.0
-        loss_conf  = nn.MSELoss(size_average=False)(conf*conf_mask, tconf*conf_mask)/2.0
-        loss_x     = loss_x0 + loss_x1 + loss_x2 + loss_x3 + loss_x4 + loss_x5 + loss_x6 + loss_x7 + loss_x8 
-        loss_y     = loss_y0 + loss_y1 + loss_y2 + loss_y3 + loss_y4 + loss_y5 + loss_y6 + loss_y7 + loss_y8 
-        
-        loss_cls   = self.class_scale * nn.CrossEntropyLoss(size_average=False)(cls, tcls)
-        loss   = loss_x + loss_y + loss_conf + loss_cls
-        print('%d: nGT %d, recall %d, proposals %d, loss: x0: %f x %f, y0: %f y %f, conf %f, cls %f, total %f' % (self.seen, nGT, nCorrect, nProposals, loss_x0.data[0], loss_x.data[0], loss_y0.data[0], loss_y.data[0], loss_conf.data[0], loss_cls.data[0], loss.data[0]))
-        #else:
-        #    loss   = loss_x + loss_y + loss_conf
-        #    print('%d: nGT %d, recall %d, proposals %d, loss: x %f, y %f, conf %f, total %f' % (self.seen, nGT, nCorrect, nProposals, loss_x.data[0], loss_y.data[0], loss_conf.data[0], loss.data[0]))
+        loss_xs   = list()
+        loss_ys   = list()
+        for i in range(self.num_keypoints):
+            loss_xs.append(self.coord_scale * nn.MSELoss(size_average=False)(x[i]*coord_mask, txs[i]*coord_mask)/2.0)
+            loss_ys.append(self.coord_scale * nn.MSELoss(size_average=False)(y[i]*coord_mask, tys[i]*coord_mask)/2.0)
+        loss_conf = nn.MSELoss(size_average=False)(conf*conf_mask, tconf*conf_mask)/2.0
+        loss_x    = np.sum(loss_xs)
+        loss_y    = np.sum(loss_ys)
+        loss_cls  = self.class_scale * nn.CrossEntropyLoss(size_average=False)(cls, tcls)

+        if epoch > self.pretrain_num_epochs:
+            loss  = loss_x + loss_y + loss_cls + loss_conf # in single object pose estimation, there is no classification loss
+        else:
+            # pretrain initially without confidence loss
+            # once the coordinate predictions get better, start training for confidence as well
+            loss  = loss_x + loss_y + loss_cls
+    
+        print('%d: nGT %d, recall %d, proposals %d, loss: x %f, y %f, conf %f, cls %f, total %f' % (self.seen, nGT, nCorrect, nProposals, loss_x.data[0], loss_y.data[0], loss_conf.data[0], loss_cls.data[0], loss.data[0]))
        t4 = time.time()

        if False:
--- a/multi_obj_pose_estimation/train_multi.py
+++ b/multi_obj_pose_estimation/train_multi.py
@ -2,6 +2,7 @@ from __future__ import print_function
 import os
 os.sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 import sys
+import argparse
 import time
 import torch
 import torch.nn as nn
@ -17,16 +18,10 @@ from torch.autograd import Variable # Useful info about autograd: http://pytorch

 from darknet_multi import Darknet
 from MeshPly import MeshPly
-from utils import *    
+from utils_multi import *    
 from cfg import parse_cfg
-import dataset_multi
 from region_loss_multi import RegionLoss
-
-
-# Create new directory
-def makedirs(path):
-    if not os.path.exists( path ):
-        os.makedirs( path )
+import dataset_multi

 # Adjust learning rate during training, learning schedule can be changed in network config file
 def adjust_learning_rate(optimizer, batch):
@ -89,7 +84,7 @@ def train(epoch):
        t6 = time.time()
        region_loss.seen = region_loss.seen + data.data.size(0)
        # Compute loss, grow an array of losses for saving later on
-        loss = region_loss(output, target)
+        loss = region_loss(output, target, epoch)
        training_iters.append(epoch * math.ceil(len(train_loader.dataset) / float(batch_size) ) + niter)
        training_losses.append(convert2cpu(loss.data))
        niter += 1
@ -125,7 +120,7 @@ def train(epoch):
    t1 = time.time()
    return epoch * math.ceil(len(train_loader.dataset) / float(batch_size) ) + niter - 1 

-def eval(niter, datacfg, cfgfile):
+def eval(niter, datacfg):
    def truths_length(truths):
        for i in range(50):
            if truths[i][1] == 0:
@ -137,13 +132,12 @@ def eval(niter, datacfg, cfgfile):
    meshname      = options['mesh']
    backupdir     = options['backup']
    name          = options['name']
-    prefix        = 'results'
    # Read object model information, get 3D bounding box corners
    mesh          = MeshPly(meshname)
    vertices      = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()
    corners3D     = get_3D_corners(vertices)
    # Read intrinsic camera parameters
-    internal_calibration = get_camera_intrinsic()        
+    internal_calibration = get_camera_intrinsic(u0, v0, fx, fy)        
    
    # Get validation file names
    with open(valid_images) as fp:
@ -194,8 +188,9 @@ def eval(niter, datacfg, cfgfile):
        t3 = time.time()
        
        # Using confidence threshold, eliminate low-confidence predictions
-        trgt = target[0].view(-1, 21)
-        all_boxes = get_corresponding_region_boxes(output, conf_thresh, num_classes, anchors, num_anchors, int(trgt[0][0]), only_objectness=0)    
+        trgt = target[0].view(-1, num_labels)
+
+        all_boxes = get_multi_region_boxes(output, conf_thresh, num_classes, num_keypoints, anchors, num_anchors, int(trgt[0][0]), only_objectness=0)    
        t4 = time.time()

        # Iterate through all batch elements
@ -205,7 +200,7 @@ def eval(niter, datacfg, cfgfile):
            boxes   = all_boxes[i]

            # For each image, get all the targets (for multiple object pose estimation, there might be more than 1 target per image)
-            truths  = target[i].view(-1, 21)
+            truths  = target[i].view(-1, num_labels)

            # Get how many objects are present in the scene
            num_gts = truths_length(truths)
@ -213,24 +208,23 @@ def eval(niter, datacfg, cfgfile):

            # Iterate through each ground-truth object
            for k in range(num_gts):
-                box_gt        = [truths[k][1], truths[k][2], truths[k][3], truths[k][4], truths[k][5], truths[k][6], 
-                                truths[k][7], truths[k][8], truths[k][9], truths[k][10], truths[k][11], truths[k][12], 
-                                truths[k][13], truths[k][14], truths[k][15], truths[k][16], truths[k][17], truths[k][18], 1.0, 1.0, truths[k][0]]
-                best_conf_est = -1
-
+                box_gt = list()
+                for j in range(1, num_labels):
+                    box_gt.append(truths[k][j])
+                box_gt.extend([1.0, 1.0])
+                box_gt.append(truths[k][0])
+                
                # If the prediction has the highest confidence, choose it as our prediction
+                best_conf_est = -sys.maxsize
                for j in range(len(boxes)):
-                    if (boxes[j][18] > best_conf_est) and (boxes[j][20] == int(truths[k][0])):
-                        best_conf_est = boxes[j][18]
+                    if (boxes[j][2*num_keypoints] > best_conf_est) and (boxes[j][2*num_keypoints+2] == int(truths[k][0])):
+                        best_conf_est = boxes[j][2*num_keypoints]
                        box_pr        = boxes[j]
-                        bb2d_gt       = get_2d_bb(box_gt[:18], output.size(3))
-                        bb2d_pr       = get_2d_bb(box_pr[:18], output.size(3))
-                        iou           = bbox_iou(bb2d_gt, bb2d_pr)
-                        match         = corner_confidence9(box_gt[:18], torch.FloatTensor(boxes[j][:18]))
+                        match         = corner_confidence(box_gt[:2*num_keypoints], torch.FloatTensor(boxes[j][:2*num_keypoints]))

                # Denormalize the corner predictions 
-                corners2D_gt = np.array(np.reshape(box_gt[:18], [9, 2]), dtype='float32')
-                corners2D_pr = np.array(np.reshape(box_pr[:18], [9, 2]), dtype='float32')
+                corners2D_gt = np.array(np.reshape(box_gt[:2*num_keypoints], [num_keypoints, 2]), dtype='float32')
+                corners2D_pr = np.array(np.reshape(box_pr[:2*num_keypoints], [num_keypoints, 2]), dtype='float32')
                corners2D_gt[:, 0] = corners2D_gt[:, 0] * im_width
                corners2D_gt[:, 1] = corners2D_gt[:, 1] * im_height               
                corners2D_pr[:, 0] = corners2D_pr[:, 0] * im_width
@ -244,23 +238,24 @@ def eval(niter, datacfg, cfgfile):
                R_pr, t_pr = pnp(objpoints3D,  corners2D_pr, K)
                
                # Compute pixel error
-                Rt_gt        = np.concatenate((R_gt, t_gt), axis=1)
-                Rt_pr        = np.concatenate((R_pr, t_pr), axis=1)
-                proj_2d_gt   = compute_projection(vertices, Rt_gt, internal_calibration) 
-                proj_2d_pred = compute_projection(vertices, Rt_pr, internal_calibration) 
+                Rt_gt           = np.concatenate((R_gt, t_gt), axis=1)
+                Rt_pr           = np.concatenate((R_pr, t_pr), axis=1)
+                proj_2d_gt      = compute_projection(vertices, Rt_gt, internal_calibration) 
+                proj_2d_pred    = compute_projection(vertices, Rt_pr, internal_calibration) 
                proj_corners_gt = np.transpose(compute_projection(corners3D, Rt_gt, internal_calibration)) 
                proj_corners_pr = np.transpose(compute_projection(corners3D, Rt_pr, internal_calibration)) 
-                norm         = np.linalg.norm(proj_2d_gt - proj_2d_pred, axis=0)
-                pixel_dist   = np.mean(norm)
+                norm            = np.linalg.norm(proj_2d_gt - proj_2d_pred, axis=0)
+                pixel_dist      = np.mean(norm)
                errs_2d.append(pixel_dist)

                # Sum errors
-                testing_error_pixel  += pixel_dist
-                testing_samples      += 1
+                testing_error_pixel += pixel_dist
+                testing_samples     += 1

        t5 = time.time()

    # Compute 2D reprojection score
+    eps = 1e-5
    for px_threshold in [5, 10, 15, 20, 25, 30, 35, 40, 45, 50]:
        acc = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)
        logging('   Acc using {} px 2D Projection = {:.2f}%'.format(px_threshold, acc))
@ -281,72 +276,79 @@ def eval(niter, datacfg, cfgfile):

 def test(niter):
    
-    cfgfile = 'cfg/yolo-pose-multi.cfg'
+    modelcfg = 'cfg/yolo-pose-multi.cfg'
    datacfg = 'cfg/ape_occlusion.data'
    logging("Testing ape...")
-    eval(niter, datacfg, cfgfile)
+    eval(niter, datacfg)
    datacfg = 'cfg/can_occlusion.data'
    logging("Testing can...")
-    eval(niter, datacfg, cfgfile)
+    eval(niter, datacfg)
    datacfg = 'cfg/cat_occlusion.data'
    logging("Testing cat...")
-    eval(niter, datacfg, cfgfile)
+    eval(niter, datacfg)
    datacfg = 'cfg/duck_occlusion.data'
    logging("Testing duck...")
-    eval(niter, datacfg, cfgfile)
+    eval(niter, datacfg)
    datacfg = 'cfg/driller_occlusion.data'
    logging("Testing driller...")
-    eval(niter, datacfg, cfgfile)
+    eval(niter, datacfg)
    datacfg = 'cfg/glue_occlusion.data'
    logging("Testing glue...")
-    eval(niter, datacfg, cfgfile)
-    # datacfg = 'cfg/holepuncher_occlusion.data'
-    # logging("Testing holepuncher...")
-    # eval(niter, datacfg, cfgfile)
+    eval(niter, datacfg)

 if __name__ == "__main__":

-    # Training settings
-    datacfg       = sys.argv[1]
-    cfgfile       = sys.argv[2]
-    weightfile    = sys.argv[3]
+    # Parse command window input
+    parser = argparse.ArgumentParser(description='SingleShotPose')
+    parser.add_argument('--datacfg', type=str, default='cfg/occlusion.data') # data config
+    parser.add_argument('--modelcfg', type=str, default='cfg/yolo-pose-multi.cfg') # network config
+    parser.add_argument('--initweightfile', type=str, default='backup_multi/init.weights') # initialization weights
+    parser.add_argument('--pretrain_num_epochs', type=int, default=0) # how many epoch to pretrain
+    args                = parser.parse_args()
+    datacfg             = args.datacfg
+    modelcfg            = args.modelcfg
+    initweightfile      = args.initweightfile
+    pretrain_num_epochs = args.pretrain_num_epochs

-    # Parse configuration files
-    data_options   = read_data_cfg(datacfg)
-    net_options    = parse_cfg(cfgfile)[0]
-    trainlist      = data_options['train']
-    nsamples      = file_lines(trainlist)
-    gpus          = data_options['gpus']  # e.g. 0,1,2,3
-    gpus = '0'
-    num_workers   = int(data_options['num_workers'])
-    backupdir     = data_options['backup']
-    if not os.path.exists(backupdir):
-        makedirs(backupdir)
+    # Parse data configuration file
+    data_options = read_data_cfg(datacfg)
+    trainlist    = data_options['train']
+    gpus         = data_options['gpus']  
+    num_workers  = int(data_options['num_workers'])
+    backupdir    = data_options['backup']
+    im_width     = int(data_options['im_width'])
+    im_height    = int(data_options['im_height']) 
+    fx           = float(data_options['fx'])
+    fy           = float(data_options['fy'])
+    u0           = float(data_options['u0'])
+    v0           = float(data_options['v0'])
+
+    # Parse network and training configuration parameters
+    net_options   = parse_cfg(modelcfg)[0]
+    loss_options  = parse_cfg(modelcfg)[-1]
    batch_size    = int(net_options['batch'])
    max_batches   = int(net_options['max_batches'])
+    max_epochs    = int(net_options['max_epochs'])
    learning_rate = float(net_options['learning_rate'])
    momentum      = float(net_options['momentum'])
    decay         = float(net_options['decay'])
+    conf_thresh   = float(net_options['conf_thresh'])
+    num_keypoints = int(net_options['num_keypoints'])
+    num_classes   = int(loss_options['classes'])
+    num_anchors   = int(loss_options['num'])
    steps         = [float(step) for step in net_options['steps'].split(',')]
    scales        = [float(scale) for scale in net_options['scales'].split(',')]
-    bg_file_names = get_all_files('../VOCdevkit/VOC2012/JPEGImages')
+    anchors       = [float(anchor) for anchor in loss_options['anchors'].split(',')]

-    # Train parameters
-    max_epochs    = 700 # max_batches*batch_size/nsamples+1
+    # Further params
+    if not os.path.exists(backupdir):
+        makedirs(backupdir)
+    bg_file_names = get_all_files('../VOCdevkit/VOC2012/JPEGImages')
+    nsamples      = file_lines(trainlist)
    use_cuda      = True
    seed          = int(time.time())
-    eps           = 1e-5
-    save_interval = 10 # epoches
-    dot_interval  = 70 # batches
-    best_acc       = -1 
-
-    # Test parameters
-    conf_thresh   = 0.05
-    nms_thresh    = 0.4
-    match_thresh  = 0.5
-    iou_thresh    = 0.5
-    im_width      = 640
-    im_height     = 480 
+    best_acc      = -sys.maxsize
+    num_labels    = num_keypoints*2+3 # + 2 for image width, height, +1 for image class

    # Specify which gpus to use
    torch.manual_seed(seed)
@ -355,12 +357,11 @@ if __name__ == "__main__":
        torch.cuda.manual_seed(seed)

    # Specifiy the model and the loss
-    model       = Darknet(cfgfile)
-    region_loss = model.loss
+    model       = Darknet(modelcfg)
+    region_loss = RegionLoss(num_keypoints=num_keypoints, num_classes=num_classes, anchors=anchors, num_anchors=num_anchors, pretrain_num_epochs=pretrain_num_epochs)

    # Model settings
-    # model.load_weights(weightfile)
-    model.load_weights_until_last(weightfile) 
+    model.load_weights_until_last(initweightfile) 
    model.print_network()
    model.seen        = 0
    region_loss.iter  = model.iter
@ -368,20 +369,18 @@ if __name__ == "__main__":
    processed_batches = model.seen/batch_size
    init_width        = model.width
    init_height       = model.height
-    init_epoch        = model.seen/nsamples 
-
-    # Variable to save
-    training_iters          = []
-    training_losses         = []
-    testing_iters           = []
-    testing_errors_pixel    = []
-    testing_accuracies      = []
+    init_epoch        = model.seen//nsamples 

+    # Variables to save
+    training_iters       = []
+    training_losses      = []
+    testing_iters        = []
+    testing_errors_pixel = []
+    testing_accuracies   = []

    # Specify the number of workers
    kwargs = {'num_workers': num_workers, 'pin_memory': True} if use_cuda else {}

-
    # Pass the model to GPU
    if use_cuda:
        # model = model.cuda() 
@ -396,7 +395,6 @@ if __name__ == "__main__":
        else:
            params += [{'params': [value], 'weight_decay': decay*batch_size}]
    optimizer = optim.SGD(model.parameters(), lr=learning_rate/batch_size, momentum=momentum, dampening=0, weight_decay=decay*batch_size)
-    # optimizer = optim.Adam(model.parameters(), lr=0.001) # Adam optimization

    evaluate = False
    if evaluate:
@ -416,9 +414,9 @@ if __name__ == "__main__":
                    testing_iters=testing_iters,
                    testing_accuracies=testing_accuracies,
                    testing_errors_pixel=testing_errors_pixel) 
-                if (np.mean(testing_accuracies[-5:]) > best_acc ):
-                    best_acc = np.mean(testing_accuracies[-5:])
+                if (np.mean(testing_accuracies[-6:]) > best_acc ): # testing for 6 different objects
+                    best_acc = np.mean(testing_accuracies[-6:]) 
                    logging('best model so far!')
                    logging('save weights to %s/model.weights' % (backupdir))
                    model.module.save_weights('%s/model.weights' % (backupdir))
-        shutil.copy2('%s/model.weights' % (backupdir), '%s/model_backup.weights' % (backupdir))
+        # shutil.copy2('%s/model.weights' % (backupdir), '%s/model_backup.weights' % (backupdir))
--- a/multi_obj_pose_estimation/utils_multi.py
+++ b/multi_obj_pose_estimation/utils_multi.py
@ -0,0 +1,502 @@
+import sys
+import os
+import time
+import math
+import torch
+import numpy as np
+from PIL import Image, ImageDraw, ImageFont
+from torch.autograd import Variable
+import torch.nn.functional as F
+import cv2
+from scipy import spatial
+
+import struct 
+import imghdr 
+
+# Create new directory
+def makedirs(path):
+    if not os.path.exists( path ):
+        os.makedirs( path )
+
+# Get all the files within a directory
+def get_all_files(directory):
+    files = []
+    for f in os.listdir(directory):
+        if os.path.isfile(os.path.join(directory, f)):
+            files.append(os.path.join(directory, f))
+        else:
+            files.extend(get_all_files(os.path.join(directory, f)))
+    return files
+
+# Calculate angular distance between two rotations
+def calcAngularDistance(gt_rot, pr_rot):
+    rotDiff = np.dot(gt_rot, np.transpose(pr_rot))
+    trace = np.trace(rotDiff) 
+    return np.rad2deg(np.arccos((trace-1.0)/2.0))
+
+# Get camera intrinsic matrix
+def get_camera_intrinsic(u0, v0, fx, fy):
+    '''fx, fy: focal length parameters, u0, v0: principal point offset parameters'''
+    return np.array([[fx, 0.0, u0], [0.0, fy, v0], [0.0, 0.0, 1.0]])
+
+# Compute the projection of an array of 3D points onto a 2D image given the intrinsics and extrinsics
+def compute_projection(points_3D, transformation, internal_calibration):
+    projections_2d = np.zeros((2, points_3D.shape[1]), dtype='float32')
+    camera_projection = (internal_calibration.dot(transformation)).dot(points_3D)
+    projections_2d[0, :] = camera_projection[0, :]/camera_projection[2, :]
+    projections_2d[1, :] = camera_projection[1, :]/camera_projection[2, :]
+    return projections_2d
+
+# Transform an array of 3D points in the 3D space using extrinsics
+def compute_transformation(points_3D, transformation):
+    return transformation.dot(points_3D)
+
+# Calculate the diameter of an object model, diameter is defined as the longest distance between all the pairwise distances in the object model
+def calc_pts_diameter(pts):
+    diameter = -1
+    for pt_id in range(pts.shape[0]):
+        pt_dup = np.tile(np.array([pts[pt_id, :]]), [pts.shape[0] - pt_id, 1])
+        pts_diff = pt_dup - pts[pt_id:, :]
+        max_dist = math.sqrt((pts_diff * pts_diff).sum(axis=1).max())
+        if max_dist > diameter:
+            diameter = max_dist
+    return diameter
+
+# Compute adi metric, see https://github.com/thodan/obj_pose_eval/blob/master/obj_pose_eval/pose_error.py for further info
+def adi(pts_est, pts_gt):
+    nn_index = spatial.cKDTree(pts_est)
+    nn_dists, _ = nn_index.query(pts_gt, k=1)
+    e = nn_dists.mean()
+    return e
+
+# Get the 3D corners of the bounding box surrounding the object model
+def get_3D_corners(vertices):
+    
+    min_x = np.min(vertices[0,:])
+    max_x = np.max(vertices[0,:])
+    min_y = np.min(vertices[1,:])
+    max_y = np.max(vertices[1,:])
+    min_z = np.min(vertices[2,:])
+    max_z = np.max(vertices[2,:])
+
+    corners = np.array([[min_x, min_y, min_z],
+                        [min_x, min_y, max_z],
+                        [min_x, max_y, min_z],
+                        [min_x, max_y, max_z],
+                        [max_x, min_y, min_z],
+                        [max_x, min_y, max_z],
+                        [max_x, max_y, min_z],
+                        [max_x, max_y, max_z]])
+
+    corners = np.concatenate((np.transpose(corners), np.ones((1,8)) ), axis=0)
+    return corners
+
+# Compute pose using PnP
+def pnp(points_3D, points_2D, cameraMatrix):
+    try:
+        distCoeffs = pnp.distCoeffs
+    except:
+        distCoeffs = np.zeros((8, 1), dtype='float32') # 8 distortion-coefficient model
+
+    assert points_3D.shape[0] == points_2D.shape[0], 'points 3D and points 2D must have same number of vertices'
+
+    _, R_exp, t = cv2.solvePnP(points_3D,
+                              np.ascontiguousarray(points_2D[:,:2]).reshape((-1,1,2)),
+                              cameraMatrix,
+                              distCoeffs)
+
+    R, _ = cv2.Rodrigues(R_exp)
+    return R, t
+
+# Get the tightest bounding box surrounding keypoints
+def get_2d_bb(box, size):
+    x = box[0]
+    y = box[1]
+    min_x = np.min(np.reshape(box, [-1,2])[:,0])
+    max_x = np.max(np.reshape(box, [-1,2])[:,0])
+    min_y = np.min(np.reshape(box, [-1,2])[:,1])
+    max_y = np.max(np.reshape(box, [-1,2])[:,1])
+    w = max_x - min_x
+    h = max_y - min_y
+    new_box = [x*size, y*size, w*size, h*size]
+    return new_box
+
+# Compute IoU between two bounding boxes
+def bbox_iou(box1, box2, x1y1x2y2=False):
+    if x1y1x2y2:
+        mx = min(box1[0], box2[0])
+        Mx = max(box1[2], box2[2])
+        my = min(box1[1], box2[1])
+        My = max(box1[3], box2[3])
+        w1 = box1[2] - box1[0]
+        h1 = box1[3] - box1[1]
+        w2 = box2[2] - box2[0]
+        h2 = box2[3] - box2[1]
+    else:
+        mx = min(box1[0]-box1[2]/2.0, box2[0]-box2[2]/2.0)
+        Mx = max(box1[0]+box1[2]/2.0, box2[0]+box2[2]/2.0)
+        my = min(box1[1]-box1[3]/2.0, box2[1]-box2[3]/2.0)
+        My = max(box1[1]+box1[3]/2.0, box2[1]+box2[3]/2.0)
+        w1 = box1[2]
+        h1 = box1[3]
+        w2 = box2[2]
+        h2 = box2[3]
+    uw = Mx - mx
+    uh = My - my
+    cw = w1 + w2 - uw
+    ch = h1 + h2 - uh
+    carea = 0
+    if cw <= 0 or ch <= 0:
+        return 0.0
+
+    area1 = w1 * h1
+    area2 = w2 * h2
+    carea = cw * ch
+    uarea = area1 + area2 - carea
+    return carea/uarea
+
+# Compute confidences of current keypoint predictions
+def corner_confidences(gt_corners, pr_corners, th=80, sharpness=2, im_width=640, im_height=480):
+    ''' gt_corners: Ground-truth 2D projections of the 3D bounding box corners, shape: (16 x nA), type: torch.FloatTensor
+        pr_corners: Prediction for the 2D projections of the 3D bounding box corners, shape: (16 x nA), type: torch.FloatTensor
+        th        : distance threshold, type: int
+        sharpness : sharpness of the exponential that assigns a confidence value to the distance
+        -----------
+        return    : a torch.FloatTensor of shape (nA,) with 9 confidence values 
+    '''
+    shape = gt_corners.size()
+    nA = shape[1]  
+    dist = gt_corners - pr_corners
+    num_el = dist.numel()
+    num_keypoints = num_el//(nA*2)
+    dist = dist.t().contiguous().view(nA, num_keypoints, 2)
+    dist[:, :, 0] = dist[:, :, 0] * im_width
+    dist[:, :, 1] = dist[:, :, 1] * im_height
+
+    eps = 1e-5
+    distthresh = torch.FloatTensor([th]).repeat(nA, num_keypoints) 
+    dist = torch.sqrt(torch.sum((dist)**2, dim=2)).squeeze() # nA x 9
+    mask = (dist < distthresh).type(torch.FloatTensor)
+    conf = torch.exp(sharpness*(1 - dist/distthresh))-1  # mask * (torch.exp(math.log(2) * (1.0 - dist/rrt)) - 1)
+    conf0 = torch.exp(sharpness*(1 - torch.zeros(conf.size(0),1))) - 1
+    conf = conf / conf0.repeat(1, num_keypoints)
+    # conf = 1 - dist/distthresh
+    conf = mask * conf  # nA x 9
+    mean_conf = torch.mean(conf, dim=1)
+    return mean_conf
+
+# Compute confidence of the current keypoint prediction
+def corner_confidence(gt_corners, pr_corners, th=80, sharpness=2, im_width=640, im_height=480):
+    ''' gt_corners: Ground-truth 2D projections of the 3D bounding box corners, shape: (18,) type: list
+        pr_corners: Prediction for the 2D projections of the 3D bounding box corners, shape: (18,), type: list
+        th        : distance threshold, type: int
+        sharpness : sharpness of the exponential that assigns a confidence value to the distance
+        -----------
+        return    : a list of shape (9,) with 9 confidence values 
+    '''
+    dist = torch.FloatTensor(gt_corners) - pr_corners
+    num_keypoints = dist.numel()//2
+    dist = dist.view(num_keypoints, 2)
+    dist[:, 0] = dist[:, 0] * im_width
+    dist[:, 1] = dist[:, 1] * im_height
+    eps = 1e-5
+    dist  = torch.sqrt(torch.sum((dist)**2, dim=1))
+    mask  = (dist < th).type(torch.FloatTensor)
+    conf  = torch.exp(sharpness * (1.0 - dist/th)) - 1
+    conf0 = torch.exp(torch.FloatTensor([sharpness])) - 1 + eps
+    conf  = conf / conf0.repeat(num_keypoints, 1)
+    # conf = 1.0 - dist/th
+    conf  = mask * conf 
+    return torch.mean(conf)
+
+# Compute sigmoid
+def sigmoid(x):
+    return 1.0/(math.exp(-x)+1.)
+
+# Compute softmax function
+def softmax(x):
+    x = torch.exp(x - torch.max(x))
+    x = x/x.sum()
+    return x
+
+# Apply non-maxima suppression on a set of bounding boxes
+def nms(boxes, nms_thresh):
+    if len(boxes) == 0:
+        return boxes
+
+    det_confs = torch.zeros(len(boxes))
+    for i in range(len(boxes)):
+        det_confs[i] = 1-boxes[i][4]                
+
+    _,sortIds = torch.sort(det_confs)
+    out_boxes = []
+    for i in range(len(boxes)):
+        box_i = boxes[sortIds[i]]
+        if box_i[4] > 0:
+            out_boxes.append(box_i)
+            for j in range(i+1, len(boxes)):
+                box_j = boxes[sortIds[j]]
+                if bbox_iou(box_i, box_j, x1y1x2y2=False) > nms_thresh:
+                    box_j[4] = 0
+    return out_boxes
+
+# Fix the wrong order of corners on the Occlusion dataset
+def fix_corner_order(corners2D_gt):
+    corners2D_gt_corrected = np.zeros((9, 2), dtype='float32')
+    corners2D_gt_corrected[0, :] = corners2D_gt[0, :]
+    corners2D_gt_corrected[1, :] = corners2D_gt[1, :]
+    corners2D_gt_corrected[2, :] = corners2D_gt[3, :]
+    corners2D_gt_corrected[3, :] = corners2D_gt[5, :]
+    corners2D_gt_corrected[4, :] = corners2D_gt[7, :]
+    corners2D_gt_corrected[5, :] = corners2D_gt[2, :]
+    corners2D_gt_corrected[6, :] = corners2D_gt[4, :]
+    corners2D_gt_corrected[7, :] = corners2D_gt[6, :]
+    corners2D_gt_corrected[8, :] = corners2D_gt[8, :]
+    return corners2D_gt_corrected
+
+# Convert float tensors in GPU to tensors in CPU
+def convert2cpu(gpu_matrix):
+    return torch.FloatTensor(gpu_matrix.size()).copy_(gpu_matrix)
+
+# Convert long tensors in GPU to tensors in CPU
+def convert2cpu_long(gpu_matrix):
+    return torch.LongTensor(gpu_matrix.size()).copy_(gpu_matrix)
+
+# Get potential sets of predictions at test time
+def get_multi_region_boxes(output, conf_thresh, num_classes, num_keypoints, anchors, num_anchors, correspondingclass, only_objectness=1, validation=False):
+    
+    # Parameters
+    anchor_step = len(anchors)//num_anchors
+    if output.dim() == 3:
+        output = output.unsqueeze(0)
+    batch = output.size(0)
+    assert(output.size(1) == (2*num_keypoints+1+num_classes)*num_anchors)
+    h = output.size(2)
+    w = output.size(3)
+
+    # Activation
+    t0 = time.time()
+    all_boxes = []
+    max_conf = -sys.maxsize
+    max_cls_conf = -sys.maxsize
+    output    = output.view(batch*num_anchors, 2*num_keypoints+1+num_classes, h*w).transpose(0,1).contiguous().view(2*num_keypoints+1+num_classes, batch*num_anchors*h*w)
+    grid_x    = torch.linspace(0, w-1, w).repeat(h,1).repeat(batch*num_anchors, 1, 1).view(batch*num_anchors*h*w).cuda()
+    grid_y    = torch.linspace(0, h-1, h).repeat(w,1).t().repeat(batch*num_anchors, 1, 1).view(batch*num_anchors*h*w).cuda()
+    
+    xs = list()
+    ys = list()
+    xs.append(torch.sigmoid(output[0]) + grid_x)
+    ys.append(torch.sigmoid(output[1]) + grid_y)
+    for j in range(1,num_keypoints):
+        xs.append(output[2*j + 0] + grid_x)
+        ys.append(output[2*j + 1] + grid_y)
+    det_confs = torch.sigmoid(output[2*num_keypoints])
+    cls_confs = torch.nn.Softmax()(Variable(output[2*num_keypoints+1:2*num_keypoints+1+num_classes].transpose(0,1))).data
+    cls_max_confs, cls_max_ids = torch.max(cls_confs, 1)
+    cls_max_confs = cls_max_confs.view(-1)
+    cls_max_ids   = cls_max_ids.view(-1)
+    t1 = time.time()
+    
+    # GPU to CPU
+    sz_hw = h*w
+    sz_hwa = sz_hw*num_anchors
+    det_confs = convert2cpu(det_confs)
+    cls_max_confs = convert2cpu(cls_max_confs)
+    cls_max_ids = convert2cpu_long(cls_max_ids)
+    for j in range(num_keypoints):
+        xs[j] = convert2cpu(xs[j])
+        ys[j] = convert2cpu(ys[j])
+    if validation:
+        cls_confs = convert2cpu(cls_confs.view(-1, num_classes))
+    t2 = time.time()
+
+    # Boxes filter
+    for b in range(batch):
+        boxes = []
+        max_conf = -1
+        for cy in range(h):
+            for cx in range(w):
+                for i in range(num_anchors):
+                    ind = b*sz_hwa + i*sz_hw + cy*w + cx
+                    det_conf =  det_confs[ind]
+                    if only_objectness:
+                        conf = det_confs[ind]
+                    else:
+                        conf = det_confs[ind] * cls_max_confs[ind]
+                    
+                    if (det_confs[ind] > max_conf) and (cls_confs[ind, correspondingclass] > max_cls_conf):
+                        max_conf = det_confs[ind]
+                        max_cls_conf = cls_confs[ind, correspondingclass]
+                        max_ind = ind                  
+    
+                    if conf > conf_thresh:
+                        bcx = list()
+                        bcy = list()
+                        for j in range(num_keypoints):
+                            bcx.append(xs[j][ind])
+                            bcy.append(ys[j][ind])
+                        cls_max_conf = cls_max_confs[ind]
+                        cls_max_id = cls_max_ids[ind]
+                        box = list()
+                        for j in range(num_keypoints):
+                            box.append(bcx[j]/w)
+                            box.append(bcy[j]/h)
+                        box.append(det_conf)
+                        box.append(cls_max_conf)
+                        box.append(cls_max_id)    
+                        if (not only_objectness) and validation:
+                            for c in range(num_classes):
+                                tmp_conf = cls_confs[ind][c]
+                                if c != cls_max_id and det_confs[ind]*tmp_conf > conf_thresh:
+                                    box.append(tmp_conf)
+                                    box.append(c)
+                        boxes.append(box)
+        if (len(boxes) == 0) or (not (correspondingclass in np.array(boxes)[:,2*num_keypoints+2])):
+            bcx = list()
+            bcy = list()
+            for j in range(num_keypoints):
+                bcx.append(xs[j][max_ind])
+                bcy.append(ys[j][max_ind])
+            cls_max_conf = max_cls_conf # cls_max_confs[max_ind]
+            cls_max_id = correspondingclass # cls_max_ids[max_ind]
+            det_conf = max_conf # det_confs[max_ind]
+            box = list()
+            for j in range(num_keypoints):
+                box.append(bcx[j]/w)
+                box.append(bcy[j]/h)
+            box.append(det_conf)
+            box.append(cls_max_conf)
+            box.append(cls_max_id)     
+            boxes.append(box)
+            all_boxes.append(boxes)
+        else:
+            all_boxes.append(boxes)
+
+    t3 = time.time()
+    if False:
+        print('---------------------------------')
+        print('matrix computation : %f' % (t1-t0))
+        print('        gpu to cpu : %f' % (t2-t1))
+        print('      boxes filter : %f' % (t3-t2))
+        print('---------------------------------')
+    return all_boxes
+
+# Read the labels from the file
+def read_truths(lab_path, num_keypoints=9):
+    num_labels = 2*num_keypoints+3 # +2 for width, height, +1 for class label
+    if os.path.getsize(lab_path):
+        truths = np.loadtxt(lab_path)
+        truths = truths.reshape(truths.size//num_labels, num_labels) # to avoid single truth problem
+        return truths
+    else:
+        return np.array([])
+
+def read_truths_args(lab_path, num_keypoints=9):
+    num_labels = 2*num_keypoints+1
+    truths = read_truths(lab_path)
+    new_truths = []
+    for i in range(truths.shape[0]):
+        for j in range(num_labels):
+            new_truths.append(truths[i][j])
+    return np.array(new_truths)
+
+def read_pose(lab_path):
+    if os.path.getsize(lab_path):
+        truths = np.loadtxt(lab_path)
+        return truths
+    else:
+        return np.array([])
+
+def load_class_names(namesfile):
+    class_names = []
+    with open(namesfile, 'r') as fp:
+        lines = fp.readlines()
+    for line in lines:
+        line = line.rstrip()
+        class_names.append(line)
+    return class_names
+
+def image2torch(img):
+    width = img.width
+    height = img.height
+    img = torch.ByteTensor(torch.ByteStorage.from_buffer(img.tobytes()))
+    img = img.view(height, width, 3).transpose(0,1).transpose(0,2).contiguous()
+    img = img.view(1, 3, height, width)
+    img = img.float().div(255.0)
+    return img
+
+def read_data_cfg(datacfg):
+    options = dict()
+    options['gpus'] = '0,1,2,3'
+    options['num_workers'] = '10'
+    with open(datacfg, 'r') as fp:
+        lines = fp.readlines()
+
+    for line in lines:
+        line = line.strip()
+        if line == '':
+            continue
+        key,value = line.split('=')
+        key = key.strip()
+        value = value.strip()
+        options[key] = value
+    return options
+
+def scale_bboxes(bboxes, width, height):
+    import copy
+    dets = copy.deepcopy(bboxes)
+    for i in range(len(dets)):
+        dets[i][0] = dets[i][0] * width
+        dets[i][1] = dets[i][1] * height
+        dets[i][2] = dets[i][2] * width
+        dets[i][3] = dets[i][3] * height
+    return dets
+      
+def file_lines(thefilepath):
+    count = 0
+    thefile = open(thefilepath, 'rb')
+    while True:
+        buffer = thefile.read(8192*1024)
+        if not buffer:
+            break
+        count += buffer.count(b'\n')
+    thefile.close( )
+    return count
+
+def get_image_size(fname):
+    '''Determine the image type of fhandle and return its size.
+    from draco'''
+    with open(fname, 'rb') as fhandle:
+        head = fhandle.read(24)
+        if len(head) != 24: 
+            return
+        if imghdr.what(fname) == 'png':
+            check = struct.unpack('>i', head[4:8])[0]
+            if check != 0x0d0a1a0a:
+                return
+            width, height = struct.unpack('>ii', head[16:24])
+        elif imghdr.what(fname) == 'gif':
+            width, height = struct.unpack('<HH', head[6:10])
+        elif imghdr.what(fname) == 'jpeg' or imghdr.what(fname) == 'jpg':
+            try:
+                fhandle.seek(0) # Read 0xff next
+                size = 2 
+                ftype = 0 
+                while not 0xc0 <= ftype <= 0xcf:
+                    fhandle.seek(size, 1)
+                    byte = fhandle.read(1)
+                    while ord(byte) == 0xff:
+                        byte = fhandle.read(1)
+                    ftype = ord(byte)
+                    size = struct.unpack('>H', fhandle.read(2))[0] - 2 
+                # We are at a SOFn block
+                fhandle.seek(1, 1)  # Skip `precision' byte.
+                height, width = struct.unpack('>HH', fhandle.read(4))
+            except Exception: #IGNORE:W0703
+                return
+        else:
+            return
+        return width, height
+
+def logging(message):
+    print('%s %s' % (time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()), message))
--- a/multi_obj_pose_estimation/valid_multi.py
+++ b/multi_obj_pose_estimation/valid_multi.py
@ -1,77 +1,75 @@
 import os
 os.sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 import torch
-from torch.autograd import Variable
-from torchvision import datasets, transforms
 import matplotlib.pyplot as plt
 import scipy.misc
 import warnings
+import sys
+import argparse
 warnings.filterwarnings("ignore")
+from torch.autograd import Variable
+from torchvision import datasets, transforms

-from darknet_multi import Darknet
-from utils import *
 import dataset_multi
+from darknet_multi import Darknet
+from utils_multi import *
+from cfg import parse_cfg
 from MeshPly import MeshPly

-def valid(datacfg, cfgfile, weightfile, conf_th):
+def valid(datacfg, cfgfile, weightfile):
    def truths_length(truths):
        for i in range(50):
            if truths[i][1] == 0:
                return i

-    # Parse configuration files
-    options       = read_data_cfg(datacfg)
-    valid_images  = options['valid']
-    meshname      = options['mesh']
-    name          = options['name']
-    prefix        = 'results'
-    # Read object model information, get 3D bounding box corners
-    mesh          = MeshPly(meshname)
-    vertices      = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()
-    corners3D     = get_3D_corners(vertices)
-    diam          = float(options['diam'])
+    # Parse data configuration files
+    data_options = read_data_cfg(datacfg)
+    valid_images = data_options['valid']
+    meshname     = data_options['mesh']
+    name         = data_options['name']
+    im_width     = int(data_options['im_width'])
+    im_height    = int(data_options['im_height']) 
+    fx           = float(data_options['fx'])
+    fy           = float(data_options['fy'])
+    u0           = float(data_options['u0'])
+    v0           = float(data_options['v0'])
+    
+    # Parse net configuration file
+    net_options   = parse_cfg(cfgfile)[0]
+    loss_options  = parse_cfg(cfgfile)[-1]
+    conf_thresh   = float(net_options['conf_thresh'])
+    num_keypoints = int(net_options['num_keypoints'])
+    num_classes   = int(loss_options['classes'])
+    num_anchors   = int(loss_options['num'])
+    anchors       = [float(anchor) for anchor in loss_options['anchors'].split(',')]

-    # Read intrinsic camera parameters
-    internal_calibration = get_camera_intrinsic()
+    # Read object model information, get 3D bounding box corners, get intrinsics
+    mesh                  = MeshPly(meshname)
+    vertices              = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()
+    corners3D             = get_3D_corners(vertices)
+    diam                  = float(data_options['diam'])
+    intrinsic_calibration = get_camera_intrinsic(u0, v0, fx, fy) # camera params

-    # Get validation file names
-    with open(valid_images) as fp:
+    # Network I/O params
+    num_labels = 2*num_keypoints+3 # +2 for width, height, +1 for object class
+    errs_2d = []  # to save
+    with open(valid_images) as fp:     # validation file names
        tmp_files = fp.readlines()
        valid_files = [item.rstrip() for item in tmp_files]
-    
+
+    # Compute-related Parameters
+    use_cuda = True # whether to use cuda or no
+    kwargs = {'num_workers': 4, 'pin_memory': True} # number of workers etc.
+
    # Specicy model, load pretrained weights, pass to GPU and set the module in evaluation mode
    model = Darknet(cfgfile)
    model.load_weights(weightfile)
    model.cuda()
    model.eval()

-    # Get the parser for the test dataset
-    valid_dataset = dataset_multi.listDataset(valid_images, shape=(model.width, model.height),
-                       shuffle=False,
-                       objclass=name,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                       ]))
-    valid_batchsize = 1
-
-    # Specify the number of workers for multiple processing, get the dataloader for the test dataset
-    kwargs = {'num_workers': 4, 'pin_memory': True}
-    test_loader = torch.utils.data.DataLoader(
-        valid_dataset, batch_size=valid_batchsize, shuffle=False, **kwargs) 
-
-    # Parameters
-    use_cuda        = True
-    num_classes     = 13
-    anchors         = [1.4820, 2.2412, 2.0501, 3.1265, 2.3946, 4.6891, 3.1018, 3.9910, 3.4879, 5.8851] 
-    num_anchors     = 5
-    eps             = 1e-5
-    conf_thresh     = conf_th
-    iou_thresh      = 0.5
-
-    # Parameters to save
-    errs_2d             = []
-    edges = [[1, 2], [1, 3], [1, 5], [2, 4], [2, 6], [3, 4], [3, 7], [4, 8], [5, 6], [5, 7], [6, 8], [7, 8]]
-    edges_corners = [[0, 1], [0, 2], [0, 4], [1, 3], [1, 5], [2, 3], [2, 6], [3, 7], [4, 5], [4, 6], [5, 7], [6, 7]]
+    # Get the dataloader for the test dataset
+    valid_dataset = dataset_multi.listDataset(valid_images, shape=(model.width, model.height), shuffle=False, objclass=name, transform=transforms.Compose([transforms.ToTensor(),]))
+    test_loader   = torch.utils.data.DataLoader(valid_dataset, batch_size=1, shuffle=False, **kwargs) 

    # Iterate through test batches (Batch size for test data is 1)
    logging('Testing {}...'.format(name))
@ -92,8 +90,8 @@ def valid(datacfg, cfgfile, weightfile, conf_th):
        t3 = time.time()
        
        # Using confidence threshold, eliminate low-confidence predictions
-        trgt = target[0].view(-1, 21)
-        all_boxes = get_corresponding_region_boxes(output, conf_thresh, num_classes, anchors, num_anchors, int(trgt[0][0]), only_objectness=0)        
+        trgt = target[0].view(-1, num_labels)
+        all_boxes = get_multi_region_boxes(output, conf_thresh, num_classes, num_keypoints, anchors, num_anchors, int(trgt[0][0]), only_objectness=0)        
        t4 = time.time()
        
        # Iterate through all images in the batch
@ -103,51 +101,49 @@ def valid(datacfg, cfgfile, weightfile, conf_th):
            boxes   = all_boxes[i]
            
            # For each image, get all the targets (for multiple object pose estimation, there might be more than 1 target per image)
-            truths  = target[i].view(-1, 21)
+            truths  = target[i].view(-1, num_labels)
            
            # Get how many object are present in the scene
            num_gts = truths_length(truths)

            # Iterate through each ground-truth object
            for k in range(num_gts):
-                box_gt        = [truths[k][1], truths[k][2], truths[k][3], truths[k][4], truths[k][5], truths[k][6], 
-                                truths[k][7], truths[k][8], truths[k][9], truths[k][10], truths[k][11], truths[k][12], 
-                                truths[k][13], truths[k][14], truths[k][15], truths[k][16], truths[k][17], truths[k][18], 1.0, 1.0, truths[k][0]]
-                best_conf_est = -1
+                box_gt = list()
+                for j in range(1, num_labels):
+                    box_gt.append(truths[k][j])
+                box_gt.extend([1.0, 1.0])
+                box_gt.append(truths[k][0])
                
-
                # If the prediction has the highest confidence, choose it as our prediction
+                best_conf_est = -sys.maxsize
                for j in range(len(boxes)):
-                    if (boxes[j][18] > best_conf_est) and (boxes[j][20] == int(truths[k][0])):
-                        best_conf_est = boxes[j][18]
+                    if (boxes[j][2*num_keypoints] > best_conf_est) and (boxes[j][2*num_keypoints+2] == int(truths[k][0])):
+                        best_conf_est = boxes[j][2*num_keypoints]
                        box_pr        = boxes[j]
-                        bb2d_gt       = get_2d_bb(box_gt[:18], output.size(3))
-                        bb2d_pr       = get_2d_bb(box_pr[:18], output.size(3))
-                        iou           = bbox_iou(bb2d_gt, bb2d_pr)
-                        match         = corner_confidence9(box_gt[:18], torch.FloatTensor(boxes[j][:18]))
+                        match         = corner_confidence(box_gt[:2*num_keypoints], torch.FloatTensor(boxes[j][:2*num_keypoints]))
                    
                # Denormalize the corner predictions 
-                corners2D_gt = np.array(np.reshape(box_gt[:18], [9, 2]), dtype='float32')
-                corners2D_pr = np.array(np.reshape(box_pr[:18], [9, 2]), dtype='float32')
-                corners2D_gt[:, 0] = corners2D_gt[:, 0] * 640
-                corners2D_gt[:, 1] = corners2D_gt[:, 1] * 480               
-                corners2D_pr[:, 0] = corners2D_pr[:, 0] * 640
-                corners2D_pr[:, 1] = corners2D_pr[:, 1] * 480
+                corners2D_gt = np.array(np.reshape(box_gt[:2*num_keypoints], [-1, 2]), dtype='float32')
+                corners2D_pr = np.array(np.reshape(box_pr[:2*num_keypoints], [-1, 2]), dtype='float32')
+                corners2D_gt[:, 0] = corners2D_gt[:, 0] * im_width
+                corners2D_gt[:, 1] = corners2D_gt[:, 1] * im_height               
+                corners2D_pr[:, 0] = corners2D_pr[:, 0] * im_width
+                corners2D_pr[:, 1] = corners2D_pr[:, 1] * im_height
                corners2D_gt_corrected = fix_corner_order(corners2D_gt) # Fix the order of corners
                
                # Compute [R|t] by pnp
                objpoints3D = np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32')
-                K = np.array(internal_calibration, dtype='float32')
+                K = np.array(intrinsic_calibration, dtype='float32')
                R_gt, t_gt = pnp(objpoints3D,  corners2D_gt_corrected, K)
                R_pr, t_pr = pnp(objpoints3D,  corners2D_pr, K)
                
                # Compute pixel error
                Rt_gt        = np.concatenate((R_gt, t_gt), axis=1)
                Rt_pr        = np.concatenate((R_pr, t_pr), axis=1)
-                proj_2d_gt   = compute_projection(vertices, Rt_gt, internal_calibration) 
-                proj_2d_pred = compute_projection(vertices, Rt_pr, internal_calibration) 
-                proj_corners_gt = np.transpose(compute_projection(corners3D, Rt_gt, internal_calibration)) 
-                proj_corners_pr = np.transpose(compute_projection(corners3D, Rt_pr, internal_calibration)) 
+                proj_2d_gt   = compute_projection(vertices, Rt_gt, intrinsic_calibration) 
+                proj_2d_pred = compute_projection(vertices, Rt_pr, intrinsic_calibration) 
+                proj_corners_gt = np.transpose(compute_projection(corners3D, Rt_gt, intrinsic_calibration)) 
+                proj_corners_pr = np.transpose(compute_projection(corners3D, Rt_pr, intrinsic_calibration)) 
                norm         = np.linalg.norm(proj_2d_gt - proj_2d_pred, axis=0)
                pixel_dist   = np.mean(norm)
                errs_2d.append(pixel_dist)
@ -155,29 +151,28 @@ def valid(datacfg, cfgfile, weightfile, conf_th):
        t5 = time.time()

    # Compute 2D projection score
+    eps = 1e-5
    for px_threshold in [5, 10, 15, 20, 25, 30, 35, 40, 45, 50]:
        acc = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)
        # Print test statistics
        logging('   Acc using {} px 2D Projection = {:.2f}%'.format(px_threshold, acc))

 if __name__ == '__main__' and __package__ is None:
-    import sys
-    if len(sys.argv) == 3:
-        conf_th = 0.05
-        cfgfile = sys.argv[1]
-        weightfile = sys.argv[2]
-        datacfg = 'cfg/ape_occlusion.data'
-        valid(datacfg, cfgfile, weightfile, conf_th)
-        datacfg = 'cfg/can_occlusion.data'
-        valid(datacfg, cfgfile, weightfile, conf_th)
-        datacfg = 'cfg/cat_occlusion.data'
-        valid(datacfg, cfgfile, weightfile, conf_th)
-        datacfg = 'cfg/duck_occlusion.data'
-        valid(datacfg, cfgfile, weightfile, conf_th)
-        datacfg = 'cfg/glue_occlusion.data'
-        valid(datacfg, cfgfile, weightfile, conf_th)
-        datacfg = 'cfg/holepuncher_occlusion.data'
-        valid(datacfg, cfgfile, weightfile, conf_th)
-    else:
-        print('Usage:')
-        print(' python valid.py cfgfile weightfile')
+
+    parser = argparse.ArgumentParser(description='SingleShotPose')
+    parser.add_argument('--modelcfg', type=str, default='cfg/yolo-pose-multi.cfg') # network config
+    parser.add_argument('--initweightfile', type=str, default='backup_multi/model_backup.weights') # initialization weights
+    args = parser.parse_args()
+    datacfg = 'cfg/ape_occlusion.data'
+    valid(datacfg, args.modelcfg, args.initweightfile)
+    datacfg = 'cfg/can_occlusion.data'
+    valid(datacfg, args.modelcfg, args.initweightfile)
+    datacfg = 'cfg/cat_occlusion.data'
+    valid(datacfg, args.modelcfg, args.initweightfile)
+    datacfg = 'cfg/duck_occlusion.data'
+    valid(datacfg, args.modelcfg, args.initweightfile)
+    datacfg = 'cfg/glue_occlusion.data'
+    valid(datacfg, args.modelcfg, args.initweightfile)
+    datacfg = 'cfg/holepuncher_occlusion.data'
+    valid(datacfg, args.modelcfg, args.initweightfile)
+
--- a/py2/.gitignore
+++ b/py2/.gitignore
@ -0,0 +1,332 @@
+## Ignore Visual Studio temporary files, build results, and
+## files generated by popular Visual Studio add-ons.
+##
+## Get latest from https://github.com/github/gitignore/blob/master/VisualStudio.gitignore
+
+# User-specific files
+*.suo
+*.user
+*.userosscache
+*.sln.docstates
+*.ipynb_checkpoints
+*.DS_Store
+
+# User-specific files (MonoDevelop/Xamarin Studio)
+*.userprefs
+
+# Build results
+[Dd]ebug/
+[Dd]ebugPublic/
+[Rr]elease/
+[Rr]eleases/
+x64/
+x86/
+bld/
+[Bb]in/
+[Oo]bj/
+[Ll]og/
+
+# Visual Studio 2015/2017 cache/options directory
+.vs/
+# Uncomment if you have tasks that create the project's static files in wwwroot
+#wwwroot/
+
+# Visual Studio 2017 auto generated files
+Generated\ Files/
+
+# MSTest test Results
+[Tt]est[Rr]esult*/
+[Bb]uild[Ll]og.*
+
+# NUNIT
+*.VisualState.xml
+TestResult.xml
+
+# Build Results of an ATL Project
+[Dd]ebugPS/
+[Rr]eleasePS/
+dlldata.c
+
+# Benchmark Results
+BenchmarkDotNet.Artifacts/
+
+# .NET Core
+project.lock.json
+project.fragment.lock.json
+artifacts/
+**/Properties/launchSettings.json
+
+# StyleCop
+StyleCopReport.xml
+
+# Files built by Visual Studio
+*_i.c
+*_p.c
+*_i.h
+*.ilk
+*.meta
+*.obj
+*.iobj
+*.pch
+*.pdb
+*.ipdb
+*.pgc
+*.pgd
+*.rsp
+*.sbr
+*.tlb
+*.tli
+*.tlh
+*.tmp
+*.tmp_proj
+*.log
+*.vspscc
+*.vssscc
+.builds
+*.pidb
+*.svclog
+*.scc
+
+# Chutzpah Test files
+_Chutzpah*
+
+# Visual C++ cache files
+ipch/
+*.aps
+*.ncb
+*.opendb
+*.opensdf
+*.sdf
+*.cachefile
+*.VC.db
+*.VC.VC.opendb
+
+# Visual Studio profiler
+*.psess
+*.vsp
+*.vspx
+*.sap
+
+# Visual Studio Trace Files
+*.e2e
+
+# TFS 2012 Local Workspace
+$tf/
+
+# Guidance Automation Toolkit
+*.gpState
+
+# ReSharper is a .NET coding add-in
+_ReSharper*/
+*.[Rr]e[Ss]harper
+*.DotSettings.user
+
+# JustCode is a .NET coding add-in
+.JustCode
+
+# TeamCity is a build add-in
+_TeamCity*
+
+# DotCover is a Code Coverage Tool
+*.dotCover
+
+# AxoCover is a Code Coverage Tool
+.axoCover/*
+!.axoCover/settings.json
+
+# Visual Studio code coverage results
+*.coverage
+*.coveragexml
+
+# NCrunch
+_NCrunch_*
+.*crunch*.local.xml
+nCrunchTemp_*
+
+# MightyMoose
+*.mm.*
+AutoTest.Net/
+
+# Web workbench (sass)
+.sass-cache/
+
+# Installshield output folder
+[Ee]xpress/
+
+# DocProject is a documentation generator add-in
+DocProject/buildhelp/
+DocProject/Help/*.HxT
+DocProject/Help/*.HxC
+DocProject/Help/*.hhc
+DocProject/Help/*.hhk
+DocProject/Help/*.hhp
+DocProject/Help/Html2
+DocProject/Help/html
+
+# Click-Once directory
+publish/
+
+# Publish Web Output
+*.[Pp]ublish.xml
+*.azurePubxml
+# Note: Comment the next line if you want to checkin your web deploy settings,
+# but database connection strings (with potential passwords) will be unencrypted
+*.pubxml
+*.publishproj
+
+# Microsoft Azure Web App publish settings. Comment the next line if you want to
+# checkin your Azure Web App publish settings, but sensitive information contained
+# in these scripts will be unencrypted
+PublishScripts/
+
+# NuGet Packages
+*.nupkg
+# The packages folder can be ignored because of Package Restore
+**/[Pp]ackages/*
+# except build/, which is used as an MSBuild target.
+!**/[Pp]ackages/build/
+# Uncomment if necessary however generally it will be regenerated when needed
+#!**/[Pp]ackages/repositories.config
+# NuGet v3's project.json files produces more ignorable files
+*.nuget.props
+*.nuget.targets
+
+# Microsoft Azure Build Output
+csx/
+*.build.csdef
+
+# Microsoft Azure Emulator
+ecf/
+rcf/
+
+# Windows Store app package directories and files
+AppPackages/
+BundleArtifacts/
+Package.StoreAssociation.xml
+_pkginfo.txt
+*.appx
+
+# Visual Studio cache files
+# files ending in .cache can be ignored
+*.[Cc]ache
+# but keep track of directories ending in .cache
+!*.[Cc]ache/
+
+# Others
+ClientBin/
+~$*
+*~
+*.dbmdl
+*.dbproj.schemaview
+*.jfm
+*.pfx
+*.publishsettings
+orleans.codegen.cs
+
+# Including strong name files can present a security risk 
+# (https://github.com/github/gitignore/pull/2483#issue-259490424)
+#*.snk
+
+# Since there are multiple workflows, uncomment next line to ignore bower_components
+# (https://github.com/github/gitignore/pull/1529#issuecomment-104372622)
+#bower_components/
+
+# RIA/Silverlight projects
+Generated_Code/
+
+# Backup & report files from converting an old project file
+# to a newer Visual Studio version. Backup files are not needed,
+# because we have git ;-)
+_UpgradeReport_Files/
+Backup*/
+UpgradeLog*.XML
+UpgradeLog*.htm
+ServiceFabricBackup/
+*.rptproj.bak
+
+# SQL Server files
+*.mdf
+*.ldf
+*.ndf
+
+# Business Intelligence projects
+*.rdl.data
+*.bim.layout
+*.bim_*.settings
+*.rptproj.rsuser
+
+# Microsoft Fakes
+FakesAssemblies/
+
+# GhostDoc plugin setting file
+*.GhostDoc.xml
+
+# Node.js Tools for Visual Studio
+.ntvs_analysis.dat
+node_modules/
+
+# Visual Studio 6 build log
+*.plg
+
+# Visual Studio 6 workspace options file
+*.opt
+
+# Visual Studio 6 auto-generated workspace file (contains which files were open etc.)
+*.vbw
+
+# Visual Studio LightSwitch build output
+**/*.HTMLClient/GeneratedArtifacts
+**/*.DesktopClient/GeneratedArtifacts
+**/*.DesktopClient/ModelManifest.xml
+**/*.Server/GeneratedArtifacts
+**/*.Server/ModelManifest.xml
+_Pvt_Extensions
+
+# Paket dependency manager
+.paket/paket.exe
+paket-files/
+
+# FAKE - F# Make
+.fake/
+
+# JetBrains Rider
+.idea/
+*.sln.iml
+
+# CodeRush
+.cr/
+
+# Python Tools for Visual Studio (PTVS)
+__pycache__/
+*.pyc
+
+# Cake - Uncomment if you are using it
+# tools/**
+# !tools/packages.config
+
+# Tabs Studio
+*.tss
+
+# Telerik's JustMock configuration file
+*.jmconfig
+
+# BizTalk build output
+*.btp.cs
+*.btm.cs
+*.odx.cs
+*.xsd.cs
+
+# OpenCover UI analysis results
+OpenCover/
+
+# Azure Stream Analytics local run output 
+ASALocalRun/
+
+# MSBuild Binary and Structured Log
+*.binlog
+
+# NVidia Nsight GPU debugger configuration file
+*.nvuser
+
+# MFractors (Xamarin productivity tool) working folder 
+.mfractor/
--- a/py2/LICENSE.txt
+++ b/py2/LICENSE.txt
@ -0,0 +1,13 @@
+Single Shot Seamless Object Pose Estimation
+
+Copyright (c) Microsoft Corporation
+
+All rights reserved.
+
+MIT License
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the Software), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
--- a/py2/MeshPly.py
+++ b/py2/MeshPly.py
@ -0,0 +1,53 @@
+# Class to read 
+
+class MeshPly:
+    def __init__(self, filename, color=[0., 0., 0.]):
+
+        f = open(filename, 'r')
+        self.vertices = []
+        self.colors = []
+        self.indices = []
+        self.normals = []
+
+        vertex_mode = False
+        face_mode = False
+
+        nb_vertices = 0
+        nb_faces = 0
+
+        idx = 0
+
+        with f as open_file_object:
+            for line in open_file_object:
+                elements = line.split()
+                if vertex_mode:
+                    self.vertices.append([float(i) for i in elements[:3]])
+                    self.normals.append([float(i) for i in elements[3:6]])
+
+                    if elements[6:9]:
+                        self.colors.append([float(i) / 255. for i in elements[6:9]])
+                    else:
+                        self.colors.append([float(i) / 255. for i in color])
+
+                    idx += 1
+                    if idx == nb_vertices:
+                        vertex_mode = False
+                        face_mode = True
+                        idx = 0
+                elif face_mode:
+                    self.indices.append([float(i) for i in elements[1:4]])
+                    idx += 1
+                    if idx == nb_faces:
+                        face_mode = False
+                elif elements[0] == 'element':
+                    if elements[1] == 'vertex':
+                        nb_vertices = int(elements[2])
+                    elif elements[1] == 'face':
+                        nb_faces = int(elements[2])
+                elif elements[0] == 'end_header':
+                    vertex_mode = True
+
+
+if __name__ == '__main__':
+    path_model = ''
+    mesh = MeshPly(path_model)
--- a/py2/cfg.py
+++ b/py2/cfg.py
@ -0,0 +1,208 @@
+import torch
+from utils import convert2cpu
+
+def parse_cfg(cfgfile):
+    blocks = []
+    fp = open(cfgfile, 'r')
+    block =  None
+    line = fp.readline()
+    while line != '':
+        line = line.rstrip()
+        if line == '' or line[0] == '#':
+            line = fp.readline()
+            continue        
+        elif line[0] == '[':
+            if block:
+                blocks.append(block)
+            block = dict()
+            block['type'] = line.lstrip('[').rstrip(']')
+            # set default value
+            if block['type'] == 'convolutional':
+                block['batch_normalize'] = 0
+        else:
+            key,value = line.split('=')
+            key = key.strip()
+            if key == 'type':
+                key = '_type'
+            value = value.strip()
+            block[key] = value
+        line = fp.readline()
+
+    if block:
+        blocks.append(block)
+    fp.close()
+    return blocks
+
+def print_cfg(blocks):
+    print('layer     filters    size              input                output');
+    prev_width = 416
+    prev_height = 416
+    prev_filters = 3
+    out_filters =[]
+    out_widths =[]
+    out_heights =[]
+    ind = -2
+    for block in blocks:
+        ind = ind + 1
+        if block['type'] == 'net':
+            prev_width = int(block['width'])
+            prev_height = int(block['height'])
+            continue
+        elif block['type'] == 'convolutional':
+            filters = int(block['filters'])
+            kernel_size = int(block['size'])
+            stride = int(block['stride'])
+            is_pad = int(block['pad'])
+            pad = (kernel_size-1)/2 if is_pad else 0
+            width = (prev_width + 2*pad - kernel_size)/stride + 1
+            height = (prev_height + 2*pad - kernel_size)/stride + 1
+            print('%5d %-6s %4d  %d x %d / %d   %3d x %3d x%4d   ->   %3d x %3d x%4d' % (ind, 'conv', filters, kernel_size, kernel_size, stride, prev_width, prev_height, prev_filters, width, height, filters))
+            prev_width = width
+            prev_height = height
+            prev_filters = filters
+            out_widths.append(prev_width)
+            out_heights.append(prev_height)
+            out_filters.append(prev_filters)
+        elif block['type'] == 'maxpool':
+            pool_size = int(block['size'])
+            stride = int(block['stride'])
+            width = prev_width/stride
+            height = prev_height/stride
+            print('%5d %-6s       %d x %d / %d   %3d x %3d x%4d   ->   %3d x %3d x%4d' % (ind, 'max', pool_size, pool_size, stride, prev_width, prev_height, prev_filters, width, height, filters))
+            prev_width = width
+            prev_height = height
+            prev_filters = filters
+            out_widths.append(prev_width)
+            out_heights.append(prev_height)
+            out_filters.append(prev_filters)
+        elif block['type'] == 'avgpool':
+            width = 1
+            height = 1
+            print('%5d %-6s                   %3d x %3d x%4d   ->  %3d' % (ind, 'avg', prev_width, prev_height, prev_filters,  prev_filters))
+            prev_width = width
+            prev_height = height
+            prev_filters = filters
+            out_widths.append(prev_width)
+            out_heights.append(prev_height)
+            out_filters.append(prev_filters)
+        elif block['type'] == 'softmax':
+            print('%5d %-6s                                    ->  %3d' % (ind, 'softmax', prev_filters))
+            out_widths.append(prev_width)
+            out_heights.append(prev_height)
+            out_filters.append(prev_filters)
+        elif block['type'] == 'cost':
+            print('%5d %-6s                                     ->  %3d' % (ind, 'cost', prev_filters))
+            out_widths.append(prev_width)
+            out_heights.append(prev_height)
+            out_filters.append(prev_filters)
+        elif block['type'] == 'reorg':
+            stride = int(block['stride'])
+            filters = stride * stride * prev_filters
+            width = prev_width/stride
+            height = prev_height/stride
+            print('%5d %-6s             / %d   %3d x %3d x%4d   ->   %3d x %3d x%4d' % (ind, 'reorg', stride, prev_width, prev_height, prev_filters, width, height, filters))
+            prev_width = width
+            prev_height = height
+            prev_filters = filters
+            out_widths.append(prev_width)
+            out_heights.append(prev_height)
+            out_filters.append(prev_filters)
+        elif block['type'] == 'route':
+            layers = block['layers'].split(',')
+            layers = [int(i) if int(i) > 0 else int(i)+ind for i in layers]
+            if len(layers) == 1:
+                print('%5d %-6s %d' % (ind, 'route', layers[0]))
+                prev_width = out_widths[layers[0]]
+                prev_height = out_heights[layers[0]]
+                prev_filters = out_filters[layers[0]]
+            elif len(layers) == 2:
+                print('%5d %-6s %d %d' % (ind, 'route', layers[0], layers[1]))
+                prev_width = out_widths[layers[0]]
+                prev_height = out_heights[layers[0]]
+                assert(prev_width == out_widths[layers[1]])
+                assert(prev_height == out_heights[layers[1]])
+                prev_filters = out_filters[layers[0]] + out_filters[layers[1]]
+            out_widths.append(prev_width)
+            out_heights.append(prev_height)
+            out_filters.append(prev_filters)
+        elif block['type'] == 'region':
+            print('%5d %-6s' % (ind, 'detection'))
+            out_widths.append(prev_width)
+            out_heights.append(prev_height)
+            out_filters.append(prev_filters)
+        elif block['type'] == 'shortcut':
+            from_id = int(block['from'])
+            from_id = from_id if from_id > 0 else from_id+ind
+            print('%5d %-6s %d' % (ind, 'shortcut', from_id))
+            prev_width = out_widths[from_id]
+            prev_height = out_heights[from_id]
+            prev_filters = out_filters[from_id]
+            out_widths.append(prev_width)
+            out_heights.append(prev_height)
+            out_filters.append(prev_filters)
+        elif block['type'] == 'connected':
+            filters = int(block['output'])
+            print('%5d %-6s                            %d  ->  %3d' % (ind, 'connected', prev_filters,  filters))
+            prev_filters = filters
+            out_widths.append(1)
+            out_heights.append(1)
+            out_filters.append(prev_filters)
+        else:
+            print('unknown type %s' % (block['type']))
+
+def load_conv(buf, start, conv_model):
+    num_w = conv_model.weight.numel()
+    num_b = conv_model.bias.numel()
+    conv_model.bias.data.copy_(torch.from_numpy(buf[start:start+num_b]));   start = start + num_b
+    conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w])); start = start + num_w
+    return start
+
+def save_conv(fp, conv_model):
+    if conv_model.bias.is_cuda:
+        convert2cpu(conv_model.bias.data).numpy().tofile(fp)
+        convert2cpu(conv_model.weight.data).numpy().tofile(fp)
+    else:
+        conv_model.bias.data.numpy().tofile(fp)
+        conv_model.weight.data.numpy().tofile(fp)
+
+def load_conv_bn(buf, start, conv_model, bn_model):
+    num_w = conv_model.weight.numel()
+    num_b = bn_model.bias.numel()
+    bn_model.bias.data.copy_(torch.from_numpy(buf[start:start+num_b]));     start = start + num_b
+    bn_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_b]));   start = start + num_b
+    bn_model.running_mean.copy_(torch.from_numpy(buf[start:start+num_b]));  start = start + num_b
+    bn_model.running_var.copy_(torch.from_numpy(buf[start:start+num_b]));   start = start + num_b
+    conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w])); start = start + num_w 
+    return start
+
+def save_conv_bn(fp, conv_model, bn_model):
+    if bn_model.bias.is_cuda:
+        convert2cpu(bn_model.bias.data).numpy().tofile(fp)
+        convert2cpu(bn_model.weight.data).numpy().tofile(fp)
+        convert2cpu(bn_model.running_mean).numpy().tofile(fp)
+        convert2cpu(bn_model.running_var).numpy().tofile(fp)
+        convert2cpu(conv_model.weight.data).numpy().tofile(fp)
+    else:
+        bn_model.bias.data.numpy().tofile(fp)
+        bn_model.weight.data.numpy().tofile(fp)
+        bn_model.running_mean.numpy().tofile(fp)
+        bn_model.running_var.numpy().tofile(fp)
+        conv_model.weight.data.numpy().tofile(fp)
+
+def load_fc(buf, start, fc_model):
+    num_w = fc_model.weight.numel()
+    num_b = fc_model.bias.numel()
+    fc_model.bias.data.copy_(torch.from_numpy(buf[start:start+num_b]));     start = start + num_b
+    fc_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w]));   start = start + num_w 
+    return start
+
+def save_fc(fp, fc_model):
+    fc_model.bias.data.numpy().tofile(fp)
+    fc_model.weight.data.numpy().tofile(fp)
+
+if __name__ == '__main__':
+    import sys
+    blocks = parse_cfg('cfg/yolo.cfg')
+    if len(sys.argv) == 2:
+        blocks = parse_cfg(sys.argv[1])
+    print_cfg(blocks)
--- a/py2/cfg/ape.data
+++ b/py2/cfg/ape.data
@ -0,0 +1,7 @@
+train  = LINEMOD/ape/train.txt
+valid  = LINEMOD/ape/test.txt
+backup = backup/ape
+mesh = LINEMOD/ape/ape.ply
+tr_range = LINEMOD/ape/training_range.txt
+name = ape
+diam = 0.103
--- a/py2/cfg/benchvise.data
+++ b/py2/cfg/benchvise.data
@ -0,0 +1,7 @@
+train  = LINEMOD/benchvise/train.txt
+valid  = LINEMOD/benchvise/test.txt
+backup = backup/benchvise
+mesh = LINEMOD/benchvise/benchvise.ply
+tr_range = LINEMOD/benchvise/training_range.txt
+name = benchvise
+diam = 0.286908
--- a/py2/cfg/cam.data
+++ b/py2/cfg/cam.data
@ -0,0 +1,7 @@
+train  = LINEMOD/cam/train.txt
+valid  = LINEMOD/cam/test.txt
+backup = backup/cam
+mesh = LINEMOD/cam/cam.ply
+tr_range = LINEMOD/cam/training_range.txt
+name = cam
+diam = 0.173
--- a/py2/cfg/can.data
+++ b/py2/cfg/can.data
@ -0,0 +1,7 @@
+train  = LINEMOD/can/train.txt
+valid  = LINEMOD/can/test.txt
+backup = backup/can
+mesh = LINEMOD/can/can.ply
+tr_range = LINEMOD/can/training_range.txt
+name = can
+diam = 0.202
--- a/py2/cfg/cat.data
+++ b/py2/cfg/cat.data
@ -0,0 +1,7 @@
+train  = LINEMOD/cat/train.txt
+valid  = LINEMOD/cat/test.txt
+backup = backup/cat
+mesh = LINEMOD/cat/cat.ply
+tr_range = LINEMOD/cat/training_range.txt
+name = cat
+diam = 0.155
--- a/py2/cfg/driller.data
+++ b/py2/cfg/driller.data
@ -0,0 +1,7 @@
+train  = LINEMOD/driller/train.txt
+valid  = LINEMOD/driller/test.txt
+backup = backup/driller
+mesh = LINEMOD/driller/driller.ply
+tr_range = LINEMOD/driller/training_range.txt
+name = driller
+diam = 0.262
--- a/py2/cfg/duck.data
+++ b/py2/cfg/duck.data
@ -0,0 +1,7 @@
+train  = LINEMOD/duck/train.txt
+valid  = LINEMOD/duck/test.txt
+backup = backup/duck
+mesh = LINEMOD/duck/duck.ply
+tr_range = LINEMOD/duck/training_range.txt
+name = duck
+diam = 0.109
--- a/py2/cfg/eggbox.data
+++ b/py2/cfg/eggbox.data
@ -0,0 +1,7 @@
+train  = LINEMOD/eggbox/train.txt
+valid  = LINEMOD/eggbox/test.txt
+backup = backup/eggbox
+mesh = LINEMOD/eggbox/eggbox.ply
+tr_range = LINEMOD/eggbox/training_range.txt
+name = eggbox
+diam = 0.176364
--- a/py2/cfg/glue.data
+++ b/py2/cfg/glue.data
@ -0,0 +1,7 @@
+train  = LINEMOD/glue/train.txt
+valid  = LINEMOD/glue/test.txt
+backup = backup/glue
+mesh = LINEMOD/glue/glue.ply
+tr_range = LINEMOD/glue/training_range.txt
+name = glue
+diam = 0.176
--- a/py2/cfg/holepuncher.data
+++ b/py2/cfg/holepuncher.data
@ -0,0 +1,7 @@
+train  = LINEMOD/holepuncher/train.txt
+valid  = LINEMOD/holepuncher/test.txt
+backup = backup/holepuncher
+mesh = LINEMOD/holepuncher/holepuncher.ply
+tr_range = LINEMOD/holepuncher/training_range.txt
+name = holepuncher
+diam = 0.162
--- a/py2/cfg/iron.data
+++ b/py2/cfg/iron.data
@ -0,0 +1,7 @@
+train  = LINEMOD/iron/train.txt
+valid  = LINEMOD/iron/test.txt
+backup = backup/iron
+mesh = LINEMOD/iron/iron.ply
+tr_range = LINEMOD/iron/training_range.txt
+name = iron
+diam = 0.303153
--- a/py2/cfg/lamp.data
+++ b/py2/cfg/lamp.data
@ -0,0 +1,7 @@
+train  = LINEMOD/lamp/train.txt
+valid  = LINEMOD/lamp/test.txt
+backup = backup/lamp
+mesh = LINEMOD/lamp/lamp.ply
+tr_range = LINEMOD/lamp/training_range.txt
+name = lamp
+diam = 0.285155
--- a/py2/cfg/phone.data
+++ b/py2/cfg/phone.data
@ -0,0 +1,7 @@
+train  = LINEMOD/phone/train.txt
+valid  = LINEMOD/phone/test.txt
+backup = backup/phone
+mesh = LINEMOD/phone/phone.ply
+tr_range = LINEMOD/phone/training_range.txt
+name = phone
+diam = 0.213
--- a/py2/cfg/yolo-pose-pre.cfg
+++ b/py2/cfg/yolo-pose-pre.cfg
@ -0,0 +1,256 @@
+[net]
+batch=32
+height=416
+width=416
+channels=3
+momentum=0.9
+decay=0.0005
+angle=0
+saturation = 1.5
+exposure = 1.5
+hue=.1
+
+learning_rate=0.001
+burn_in=1000
+max_batches = 80200
+policy=steps
+# steps=-1,500,20000,30000
+# steps=-1,180,360,540
+steps=-1,50,1000,2000
+scales=0.1,10,.1,.1
+
+[convolutional]
+batch_normalize=1
+filters=32
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=64
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=128
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=64
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=128
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=128
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=1024
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=1024
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=1024
+size=3
+stride=1
+pad=1
+activation=leaky
+
+
+#######
+
+[convolutional]
+batch_normalize=1
+size=3
+stride=1
+pad=1
+filters=1024
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+size=3
+stride=1
+pad=1
+filters=1024
+activation=leaky
+
+[route]
+layers=-9
+
+[convolutional]
+batch_normalize=1
+size=1
+stride=1
+pad=1
+filters=64
+activation=leaky
+
+[reorg]
+stride=2
+
+[route]
+layers=-1,-4
+
+[convolutional]
+batch_normalize=1
+size=3
+stride=1
+pad=1
+filters=1024
+activation=leaky
+
+[convolutional]
+size=1
+stride=1
+pad=1
+# filters=125
+filters=32
+activation=linear
+
+
+[region]
+anchors = 
+bias_match=1
+classes=13
+coords=18
+num=1
+softmax=1
+jitter=.3
+rescore=1
+
+object_scale=0
+noobject_scale=0
+class_scale=1
+coord_scale=1
+
+absolute=1
+thresh = .6
+random=1
--- a/py2/cfg/yolo-pose.cfg
+++ b/py2/cfg/yolo-pose.cfg
@ -0,0 +1,255 @@
+[net]
+batch=32
+height=416
+width=416
+channels=3
+momentum=0.9
+decay=0.0005
+angle=0
+saturation = 1.5
+exposure = 1.5
+hue=.1
+
+learning_rate=0.001
+burn_in=1000
+max_batches = 80200
+policy=steps
+# steps=-1,500,20000,30000
+steps=-1,50,3000,6000
+scales=0.1,10,.1,.1
+
+[convolutional]
+batch_normalize=1
+filters=32
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=64
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=128
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=64
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=128
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=128
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=1024
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=1024
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=1024
+size=3
+stride=1
+pad=1
+activation=leaky
+
+
+#######
+
+[convolutional]
+batch_normalize=1
+size=3
+stride=1
+pad=1
+filters=1024
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+size=3
+stride=1
+pad=1
+filters=1024
+activation=leaky
+
+[route]
+layers=-9
+
+[convolutional]
+batch_normalize=1
+size=1
+stride=1
+pad=1
+filters=64
+activation=leaky
+
+[reorg]
+stride=2
+
+[route]
+layers=-1,-4
+
+[convolutional]
+batch_normalize=1
+size=3
+stride=1
+pad=1
+filters=1024
+activation=leaky
+
+[convolutional]
+size=1
+stride=1
+pad=1
+# for a custom dataset, filters should be equal to (num_coords + num_classes + 1 conf value) * num_anchors 
+filters=20
+activation=linear
+
+
+[region]
+anchors = 
+bias_match=1
+classes=1
+coords=18
+num=1
+softmax=1
+jitter=.3
+rescore=1
+
+object_scale=5
+noobject_scale=0.1
+class_scale=1
+coord_scale=1
+
+absolute=1
+thresh = .6
+random=1
--- a/py2/darknet.py
+++ b/py2/darknet.py
@ -0,0 +1,391 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import numpy as np
+from region_loss import RegionLoss
+from cfg import *
+
+class MaxPoolStride1(nn.Module):
+    def __init__(self):
+        super(MaxPoolStride1, self).__init__()
+
+    def forward(self, x):
+        x = F.max_pool2d(F.pad(x, (0,1,0,1), mode='replicate'), 2, stride=1)
+        return x
+
+class Reorg(nn.Module):
+    def __init__(self, stride=2):
+        super(Reorg, self).__init__()
+        self.stride = stride
+    def forward(self, x):
+        stride = self.stride
+        assert(x.data.dim() == 4)
+        B = x.data.size(0)
+        C = x.data.size(1)
+        H = x.data.size(2)
+        W = x.data.size(3)
+        assert(H % stride == 0)
+        assert(W % stride == 0)
+        ws = stride
+        hs = stride
+        x = x.view(B, C, H/hs, hs, W/ws, ws).transpose(3,4).contiguous()
+        x = x.view(B, C, H/hs*W/ws, hs*ws).transpose(2,3).contiguous()
+        x = x.view(B, C, hs*ws, H/hs, W/ws).transpose(1,2).contiguous()
+        x = x.view(B, hs*ws*C, H/hs, W/ws)
+        return x
+
+class GlobalAvgPool2d(nn.Module):
+    def __init__(self):
+        super(GlobalAvgPool2d, self).__init__()
+
+    def forward(self, x):
+        N = x.data.size(0)
+        C = x.data.size(1)
+        H = x.data.size(2)
+        W = x.data.size(3)
+        x = F.avg_pool2d(x, (H, W))
+        x = x.view(N, C)
+        return x
+
+# for route and shortcut
+class EmptyModule(nn.Module):
+    def __init__(self):
+        super(EmptyModule, self).__init__()
+
+    def forward(self, x):
+        return x
+
+# support route shortcut and reorg
+class Darknet(nn.Module):
+    def __init__(self, cfgfile):
+        super(Darknet, self).__init__()
+        self.blocks = parse_cfg(cfgfile)
+        self.models = self.create_network(self.blocks) # merge conv, bn,leaky
+        self.loss = self.models[len(self.models)-1]
+
+        self.width = int(self.blocks[0]['width'])
+        self.height = int(self.blocks[0]['height'])
+
+        if self.blocks[(len(self.blocks)-1)]['type'] == 'region':
+            self.anchors = self.loss.anchors
+            self.num_anchors = self.loss.num_anchors
+            self.anchor_step = self.loss.anchor_step
+            self.num_classes = self.loss.num_classes
+
+        self.header = torch.IntTensor([0,0,0,0])
+        self.seen = 0
+        self.iter = 0
+
+    def forward(self, x):
+        ind = -2
+        self.loss = None
+        outputs = dict()
+        for block in self.blocks:
+            ind = ind + 1
+            #if ind > 0:
+            #    return x
+
+            if block['type'] == 'net':
+                continue
+            elif block['type'] == 'convolutional' or block['type'] == 'maxpool' or block['type'] == 'reorg' or block['type'] == 'avgpool' or block['type'] == 'softmax' or block['type'] == 'connected':
+                x = self.models[ind](x)
+                outputs[ind] = x
+            elif block['type'] == 'route':
+                layers = block['layers'].split(',')
+                layers = [int(i) if int(i) > 0 else int(i)+ind for i in layers]
+                if len(layers) == 1:
+                    x = outputs[layers[0]]
+                    outputs[ind] = x
+                elif len(layers) == 2:
+                    x1 = outputs[layers[0]]
+                    x2 = outputs[layers[1]]
+                    x = torch.cat((x1,x2),1)
+                    outputs[ind] = x
+            elif block['type'] == 'shortcut':
+                from_layer = int(block['from'])
+                activation = block['activation']
+                from_layer = from_layer if from_layer > 0 else from_layer + ind
+                x1 = outputs[from_layer]
+                x2 = outputs[ind-1]
+                x  = x1 + x2
+                if activation == 'leaky':
+                    x = F.leaky_relu(x, 0.1, inplace=True)
+                elif activation == 'relu':
+                    x = F.relu(x, inplace=True)
+                outputs[ind] = x
+            elif block['type'] == 'region':
+                continue
+                if self.loss:
+                    self.loss = self.loss + self.models[ind](x)
+                else:
+                    self.loss = self.models[ind](x)
+                outputs[ind] = None
+            elif block['type'] == 'cost':
+                continue
+            else:
+                print('unknown type %s' % (block['type']))
+        return x
+
+    def print_network(self):
+        print_cfg(self.blocks)
+
+    def create_network(self, blocks):
+        models = nn.ModuleList()
+    
+        prev_filters = 3
+        out_filters =[]
+        conv_id = 0
+        for block in blocks:
+            if block['type'] == 'net':
+                prev_filters = int(block['channels'])
+                continue
+            elif block['type'] == 'convolutional':
+                conv_id = conv_id + 1
+                batch_normalize = int(block['batch_normalize'])
+                filters = int(block['filters'])
+                kernel_size = int(block['size'])
+                stride = int(block['stride'])
+                is_pad = int(block['pad'])
+                pad = (kernel_size-1)/2 if is_pad else 0
+                activation = block['activation']
+                model = nn.Sequential()
+                if batch_normalize:
+                    model.add_module('conv{0}'.format(conv_id), nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias=False))
+                    model.add_module('bn{0}'.format(conv_id), nn.BatchNorm2d(filters, eps=1e-4))
+                    #model.add_module('bn{0}'.format(conv_id), BN2d(filters))
+                else:
+                    model.add_module('conv{0}'.format(conv_id), nn.Conv2d(prev_filters, filters, kernel_size, stride, pad))
+                if activation == 'leaky':
+                    model.add_module('leaky{0}'.format(conv_id), nn.LeakyReLU(0.1, inplace=True))
+                elif activation == 'relu':
+                    model.add_module('relu{0}'.format(conv_id), nn.ReLU(inplace=True))
+                prev_filters = filters
+                out_filters.append(prev_filters)
+                models.append(model)
+            elif block['type'] == 'maxpool':
+                pool_size = int(block['size'])
+                stride = int(block['stride'])
+                if stride > 1:
+                    model = nn.MaxPool2d(pool_size, stride)
+                else:
+                    model = MaxPoolStride1()
+                out_filters.append(prev_filters)
+                models.append(model)
+            elif block['type'] == 'avgpool':
+                model = GlobalAvgPool2d()
+                out_filters.append(prev_filters)
+                models.append(model)
+            elif block['type'] == 'softmax':
+                model = nn.Softmax()
+                out_filters.append(prev_filters)
+                models.append(model)
+            elif block['type'] == 'cost':
+                if block['_type'] == 'sse':
+                    model = nn.MSELoss(size_average=True)
+                elif block['_type'] == 'L1':
+                    model = nn.L1Loss(size_average=True)
+                elif block['_type'] == 'smooth':
+                    model = nn.SmoothL1Loss(size_average=True)
+                out_filters.append(1)
+                models.append(model)
+            elif block['type'] == 'reorg':
+                stride = int(block['stride'])
+                prev_filters = stride * stride * prev_filters
+                out_filters.append(prev_filters)
+                models.append(Reorg(stride))
+            elif block['type'] == 'route':
+                layers = block['layers'].split(',')
+                ind = len(models)
+                layers = [int(i) if int(i) > 0 else int(i)+ind for i in layers]
+                if len(layers) == 1:
+                    prev_filters = out_filters[layers[0]]
+                elif len(layers) == 2:
+                    assert(layers[0] == ind - 1)
+                    prev_filters = out_filters[layers[0]] + out_filters[layers[1]]
+                out_filters.append(prev_filters)
+                models.append(EmptyModule())
+            elif block['type'] == 'shortcut':
+                ind = len(models)
+                prev_filters = out_filters[ind-1]
+                out_filters.append(prev_filters)
+                models.append(EmptyModule())
+            elif block['type'] == 'connected':
+                filters = int(block['output'])
+                if block['activation'] == 'linear':
+                    model = nn.Linear(prev_filters, filters)
+                elif block['activation'] == 'leaky':
+                    model = nn.Sequential(
+                               nn.Linear(prev_filters, filters),
+                               nn.LeakyReLU(0.1, inplace=True))
+                elif block['activation'] == 'relu':
+                    model = nn.Sequential(
+                               nn.Linear(prev_filters, filters),
+                               nn.ReLU(inplace=True))
+                prev_filters = filters
+                out_filters.append(prev_filters)
+                models.append(model)
+            elif block['type'] == 'region':
+                loss = RegionLoss()
+                anchors = block['anchors'].split(',')
+                if anchors == ['']:
+                    loss.anchors = []
+                else:
+                    loss.anchors = [float(i) for i in anchors]
+                loss.num_classes = int(block['classes'])
+                loss.num_anchors = int(block['num'])
+                loss.anchor_step = len(loss.anchors)/loss.num_anchors
+                loss.object_scale = float(block['object_scale'])
+                loss.noobject_scale = float(block['noobject_scale'])
+                loss.class_scale = float(block['class_scale'])
+                loss.coord_scale = float(block['coord_scale'])
+                out_filters.append(prev_filters)
+                models.append(loss)
+            else:
+                print('unknown type %s' % (block['type']))
+    
+        return models
+
+    def load_weights(self, weightfile):
+        fp = open(weightfile, 'rb')
+        header = np.fromfile(fp, count=4, dtype=np.int32)
+        self.header = torch.from_numpy(header)
+        self.seen = self.header[3]
+        buf = np.fromfile(fp, dtype = np.float32)
+        fp.close()
+
+        start = 0
+        ind = -2
+        for block in self.blocks:
+            if start >= buf.size:
+                break
+            ind = ind + 1
+            if block['type'] == 'net':
+                continue
+            elif block['type'] == 'convolutional':
+                model = self.models[ind]
+                batch_normalize = int(block['batch_normalize'])
+                if batch_normalize:
+                    start = load_conv_bn(buf, start, model[0], model[1])
+                else:
+                    start = load_conv(buf, start, model[0])
+            elif block['type'] == 'connected':
+                model = self.models[ind]
+                if block['activation'] != 'linear':
+                    start = load_fc(buf, start, model[0])
+                else:
+                    start = load_fc(buf, start, model)
+            elif block['type'] == 'maxpool':
+                pass
+            elif block['type'] == 'reorg':
+                pass
+            elif block['type'] == 'route':
+                pass
+            elif block['type'] == 'shortcut':
+                pass
+            elif block['type'] == 'region':
+                pass
+            elif block['type'] == 'avgpool':
+                pass
+            elif block['type'] == 'softmax':
+                pass
+            elif block['type'] == 'cost':
+                pass
+            else:
+                print('unknown type %s' % (block['type']))
+
+    def load_weights_until_last(self, weightfile):
+        fp = open(weightfile, 'rb')
+        header = np.fromfile(fp, count=4, dtype=np.int32)
+        self.header = torch.from_numpy(header)
+        self.seen = self.header[3]
+        buf = np.fromfile(fp, dtype = np.float32)
+        fp.close()
+
+        start = 0
+        ind = -2
+        blocklen = len(self.blocks)
+        for i in range(blocklen-2):
+            block = self.blocks[i]
+            if start >= buf.size:
+                break
+            ind = ind + 1
+            if block['type'] == 'net':
+                continue
+            elif block['type'] == 'convolutional':
+                model = self.models[ind]
+                batch_normalize = int(block['batch_normalize'])
+                if batch_normalize:
+                    start = load_conv_bn(buf, start, model[0], model[1])
+                else:
+                    start = load_conv(buf, start, model[0])
+            elif block['type'] == 'connected':
+                model = self.models[ind]
+                if block['activation'] != 'linear':
+                    start = load_fc(buf, start, model[0])
+                else:
+                    start = load_fc(buf, start, model)
+            elif block['type'] == 'maxpool':
+                pass
+            elif block['type'] == 'reorg':
+                pass
+            elif block['type'] == 'route':
+                pass
+            elif block['type'] == 'shortcut':
+                pass
+            elif block['type'] == 'region':
+                pass
+            elif block['type'] == 'avgpool':
+                pass
+            elif block['type'] == 'softmax':
+                pass
+            elif block['type'] == 'cost':
+                pass
+            else:
+                print('unknown type %s' % (block['type']))
+
+
+    def save_weights(self, outfile, cutoff=0):
+        if cutoff <= 0:
+            cutoff = len(self.blocks)-1
+
+        fp = open(outfile, 'wb')
+        self.header[3] = self.seen
+        header = self.header
+        header.numpy().tofile(fp)
+
+        ind = -1
+        for blockId in range(1, cutoff+1):
+            ind = ind + 1
+            block = self.blocks[blockId]
+            if block['type'] == 'convolutional':
+                model = self.models[ind]
+                batch_normalize = int(block['batch_normalize'])
+                if batch_normalize:
+                    save_conv_bn(fp, model[0], model[1])
+                else:
+                    save_conv(fp, model[0])
+            elif block['type'] == 'connected':
+                model = self.models[ind]
+                if block['activation'] != 'linear':
+                    save_fc(fc, model)
+                else:
+                    save_fc(fc, model[0])
+            elif block['type'] == 'maxpool':
+                pass
+            elif block['type'] == 'reorg':
+                pass
+            elif block['type'] == 'route':
+                pass
+            elif block['type'] == 'shortcut':
+                pass
+            elif block['type'] == 'region':
+                pass
+            elif block['type'] == 'avgpool':
+                pass
+            elif block['type'] == 'softmax':
+                pass
+            elif block['type'] == 'cost':
+                pass
+            else:
+                print('unknown type %s' % (block['type']))
+        fp.close()
--- a/py2/dataset.py
+++ b/py2/dataset.py
@ -0,0 +1,101 @@
+#!/usr/bin/python
+# encoding: utf-8
+
+import os
+import random
+from PIL import Image
+import numpy as np
+from image import *
+import torch
+
+from torch.utils.data import Dataset
+from utils import read_truths_args, read_truths, get_all_files
+
+class listDataset(Dataset):
+
+    def __init__(self, root, shape=None, shuffle=True, transform=None, target_transform=None, train=False, seen=0, batch_size=64, num_workers=4, bg_file_names=None):
+       with open(root, 'r') as file:
+           self.lines = file.readlines()
+
+       if shuffle:
+           random.shuffle(self.lines)
+
+       self.nSamples         = len(self.lines)
+       self.transform        = transform
+       self.target_transform = target_transform
+       self.train            = train
+       self.shape            = shape
+       self.seen             = seen
+       self.batch_size       = batch_size
+       self.num_workers      = num_workers
+       self.bg_file_names    = bg_file_names
+
+    def __len__(self):
+        return self.nSamples
+
+    def __getitem__(self, index):
+        assert index <= len(self), 'index range error'
+        imgpath = self.lines[index].rstrip()
+
+        if self.train and index % 32== 0:
+            if self.seen < 400*32:
+               width = 13*32
+               self.shape = (width, width)
+            elif self.seen < 800*32:
+               width = (random.randint(0,7) + 13)*32
+               self.shape = (width, width)
+            elif self.seen < 1200*32:
+               width = (random.randint(0,9) + 12)*32
+               self.shape = (width, width)
+            elif self.seen < 1600*32:
+               width = (random.randint(0,11) + 11)*32
+               self.shape = (width, width)
+            elif self.seen < 2000*32:
+               width = (random.randint(0,13) + 10)*32
+               self.shape = (width, width)
+            elif self.seen < 2400*32:
+               width = (random.randint(0,15) + 9)*32
+               self.shape = (width, width)
+            elif self.seen < 3000*32:
+               width = (random.randint(0,17) + 8)*32
+               self.shape = (width, width)
+            else: # self.seen < 20000*64:
+               width = (random.randint(0,19) + 7)*32
+               self.shape = (width, width)
+        if self.train:
+            jitter = 0.2
+            hue = 0.1
+            saturation = 1.5 
+            exposure = 1.5
+
+            # Get background image path
+            random_bg_index = random.randint(0, len(self.bg_file_names) - 1)
+            bgpath = self.bg_file_names[random_bg_index]
+
+            img, label = load_data_detection(imgpath, self.shape, jitter, hue, saturation, exposure, bgpath)
+            label = torch.from_numpy(label)
+        else:
+            img = Image.open(imgpath).convert('RGB')
+            if self.shape:
+                img = img.resize(self.shape)
+    
+            labpath = imgpath.replace('images', 'labels').replace('JPEGImages', 'labels').replace('.jpg', '.txt').replace('.png','.txt')
+            label = torch.zeros(50*21)
+            if os.path.getsize(labpath):
+                ow, oh = img.size
+                tmp = torch.from_numpy(read_truths_args(labpath, 8.0/ow))
+                tmp = tmp.view(-1)
+                tsz = tmp.numel()
+                if tsz > 50*21:
+                    label = tmp[0:50*21]
+                elif tsz > 0:
+                    label[0:tsz] = tmp
+
+        if self.transform is not None:
+            img = self.transform(img)
+
+        if self.target_transform is not None:
+            label = self.target_transform(label)
+
+        self.seen = self.seen + self.num_workers
+        return (img, label)
--- a/py2/image.py
+++ b/py2/image.py
@ -0,0 +1,184 @@
+#!/usr/bin/python
+# encoding: utf-8
+import random
+import os
+from PIL import Image, ImageChops, ImageMath
+import numpy as np
+
+def scale_image_channel(im, c, v):
+    cs = list(im.split())
+    cs[c] = cs[c].point(lambda i: i * v)
+    out = Image.merge(im.mode, tuple(cs))
+    return out
+
+def distort_image(im, hue, sat, val):
+    im = im.convert('HSV')
+    cs = list(im.split())
+    cs[1] = cs[1].point(lambda i: i * sat)
+    cs[2] = cs[2].point(lambda i: i * val)
+    
+    def change_hue(x):
+        x += hue*255
+        if x > 255:
+            x -= 255
+        if x < 0:
+            x += 255
+        return x
+    cs[0] = cs[0].point(change_hue)
+    im = Image.merge(im.mode, tuple(cs))
+
+    im = im.convert('RGB')
+    return im
+
+def rand_scale(s):
+    scale = random.uniform(1, s)
+    if(random.randint(1,10000)%2): 
+        return scale
+    return 1./scale
+
+def random_distort_image(im, hue, saturation, exposure):
+    dhue = random.uniform(-hue, hue)
+    dsat = rand_scale(saturation)
+    dexp = rand_scale(exposure)
+    res  = distort_image(im, dhue, dsat, dexp)
+    return res
+
+def data_augmentation(img, shape, jitter, hue, saturation, exposure):
+
+    ow, oh = img.size
+    
+    dw =int(ow*jitter)
+    dh =int(oh*jitter)
+
+    pleft  = random.randint(-dw, dw)
+    pright = random.randint(-dw, dw)
+    ptop   = random.randint(-dh, dh)
+    pbot   = random.randint(-dh, dh)
+
+    swidth =  ow - pleft - pright
+    sheight = oh - ptop - pbot
+
+    sx = float(swidth)  / ow
+    sy = float(sheight) / oh
+    
+    flip = random.randint(1,10000)%2
+    cropped = img.crop( (pleft, ptop, pleft + swidth - 1, ptop + sheight - 1))
+
+    dx = (float(pleft)/ow)/sx
+    dy = (float(ptop) /oh)/sy
+
+    sized = cropped.resize(shape)
+
+    img = random_distort_image(sized, hue, saturation, exposure)
+    
+    return img, flip, dx,dy,sx,sy 
+
+def fill_truth_detection(labpath, w, h, flip, dx, dy, sx, sy):
+    max_boxes = 50
+    label = np.zeros((max_boxes,21))
+    if os.path.getsize(labpath):
+        bs = np.loadtxt(labpath)
+        if bs is None:
+            return label
+        bs = np.reshape(bs, (-1, 21))
+        cc = 0
+        for i in range(bs.shape[0]):
+            x0 = bs[i][1]
+            y0 = bs[i][2]
+            x1 = bs[i][3]
+            y1 = bs[i][4]
+            x2 = bs[i][5]
+            y2 = bs[i][6]
+            x3 = bs[i][7]
+            y3 = bs[i][8]
+            x4 = bs[i][9]
+            y4 = bs[i][10]
+            x5 = bs[i][11]
+            y5 = bs[i][12]
+            x6 = bs[i][13]
+            y6 = bs[i][14]
+            x7 = bs[i][15]
+            y7 = bs[i][16]
+            x8 = bs[i][17]
+            y8 = bs[i][18]
+
+            x0 = min(0.999, max(0, x0 * sx - dx)) 
+            y0 = min(0.999, max(0, y0 * sy - dy)) 
+            x1 = min(0.999, max(0, x1 * sx - dx)) 
+            y1 = min(0.999, max(0, y1 * sy - dy)) 
+            x2 = min(0.999, max(0, x2 * sx - dx))
+            y2 = min(0.999, max(0, y2 * sy - dy))
+            x3 = min(0.999, max(0, x3 * sx - dx))
+            y3 = min(0.999, max(0, y3 * sy - dy))
+            x4 = min(0.999, max(0, x4 * sx - dx))
+            y4 = min(0.999, max(0, y4 * sy - dy))
+            x5 = min(0.999, max(0, x5 * sx - dx))
+            y5 = min(0.999, max(0, y5 * sy - dy))
+            x6 = min(0.999, max(0, x6 * sx - dx))
+            y6 = min(0.999, max(0, y6 * sy - dy))
+            x7 = min(0.999, max(0, x7 * sx - dx))
+            y7 = min(0.999, max(0, y7 * sy - dy))
+            x8 = min(0.999, max(0, x8 * sx - dx))
+            y8 = min(0.999, max(0, y8 * sy - dy))
+            
+            bs[i][1] = x0
+            bs[i][2] = y0
+            bs[i][3] = x1
+            bs[i][4] = y1
+            bs[i][5] = x2
+            bs[i][6] = y2
+            bs[i][7] = x3
+            bs[i][8] = y3
+            bs[i][9] = x4
+            bs[i][10] = y4
+            bs[i][11] = x5
+            bs[i][12] = y5
+            bs[i][13] = x6
+            bs[i][14] = y6
+            bs[i][15] = x7
+            bs[i][16] = y7
+            bs[i][17] = x8
+            bs[i][18] = y8
+            
+            label[cc] = bs[i]
+            cc += 1
+            if cc >= 50:
+                break
+
+    label = np.reshape(label, (-1))
+    return label
+
+def change_background(img, mask, bg):
+    # oh = img.height  
+    # ow = img.width
+    ow, oh = img.size
+    bg = bg.resize((ow, oh)).convert('RGB')
+    
+    imcs = list(img.split())
+    bgcs = list(bg.split())
+    maskcs = list(mask.split())
+    fics = list(Image.new(img.mode, img.size).split())
+    
+    for c in range(len(imcs)):
+        negmask = maskcs[c].point(lambda i: 1 - i / 255)
+        posmask = maskcs[c].point(lambda i: i / 255)
+        fics[c] = ImageMath.eval("a * c + b * d", a=imcs[c], b=bgcs[c], c=posmask, d=negmask).convert('L')
+    out = Image.merge(img.mode, tuple(fics))
+
+    return out
+
+def load_data_detection(imgpath, shape, jitter, hue, saturation, exposure, bgpath):
+    labpath = imgpath.replace('images', 'labels').replace('JPEGImages', 'labels').replace('.jpg', '.txt').replace('.png','.txt')
+    maskpath = imgpath.replace('JPEGImages', 'mask').replace('/00', '/').replace('.jpg', '.png')
+
+    ## data augmentation
+    img = Image.open(imgpath).convert('RGB')
+    mask = Image.open(maskpath).convert('RGB')
+    bg = Image.open(bgpath).convert('RGB')
+    
+    img = change_background(img, mask, bg)
+    img,flip,dx,dy,sx,sy = data_augmentation(img, shape, jitter, hue, saturation, exposure)
+    ow, oh = img.size
+    label = fill_truth_detection(labpath, ow, oh, flip, dx, dy, 1./sx, 1./sy)
+    return img,label
+
--- a/py2/label_file_creation.md
+++ b/py2/label_file_creation.md
@ -0,0 +1,13 @@
+#### Label file creation
+
+You could follow these steps to create labels for your custom dataset:
+
+1. Get the 3D bounding box surrounding the 3D object model. We use the already provided 3D object model for the LINEMOD dataset to get the 3D bounding box. If you would like to create a 3D model for a custom object, you can refer to the Section 3.5 of the following paper and the references therein: http://cmp.felk.cvut.cz/~hodanto2/data/hodan2017tless.pdf
+
+2. Define the 8 corners of the 3D bounding box and the centroid of the 3D object model as the virtual keypoints of the object. 8 corners correspond to the [[min_x, min_y, min_z], [min_x, min_y, max_z], [min_x, max_y, min_z], [min_x, max_y, max_z], [max_x, min_y, min_z], [max_x, min_y, max_z], [max_x, max_y, min_z], [max_x, max_y, max_z]] positions of the 3D object model, and the centroid corresponds to the [0, 0, 0] position.
+
+3. Project the 3D keypoints to 2D. You can use the [compute_projection](https://github.com/Microsoft/singleshotpose/blob/master/utils.py#L39:L44) function that we provide to project the 3D points in 2D. You would need to know the intrinsic calibration matrix of the camera and the ground-truth rotation and translation to project the 3D points in 2D. Typically, obtaining ground-truth Rt transformation matrices requires a manual and intrusive annotation effort. For an example of how to acquire ground-truth data for 6D pose estimation, please refer to the Section 3.1 of the [paper](http://cmp.felk.cvut.cz/~hodanto2/data/hodan2017tless.pdf) describing the T-LESS dataset.
+
+4. Compute the width and height of a 2D rectangle tightly fitted to a masked region around the object. If you have the 2D bounding box information (e.g. width and height) for the custom object that you have, you can use those values in your label file. In practice, however, we fit a tight bounding box to the 8 corners of the projected 3D bounding box and use the width and height of that bounding box to represent these values.
+
+5. Create an array consisting of the class, 2D keypoint location and the range information and write it into a text file. The label file is organized in the following order. 1st number: class label, 2nd number: x0 (x-coordinate of the centroid), 3rd number: y0 (y-coordinate of the centroid), 4th number: x1 (x-coordinate of the first corner), 5th number: y1 (y-coordinate of the first corner), ..., 18th number: x8 (x-coordinate of the eighth corner), 19th number: y8 (y-coordinate of the eighth corner), 20th number: x range, 21st number: y range.
--- a/py2/multi_obj_pose_estimation/PLACE__~backup_multi~__FOLDER_HERE.txt
+++ b/py2/multi_obj_pose_estimation/PLACE__~backup_multi~__FOLDER_HERE.txt
--- a/py2/multi_obj_pose_estimation/init.py
+++ b/py2/multi_obj_pose_estimation/init.py
--- a/py2/multi_obj_pose_estimation/cfg/ape_occlusion.data
+++ b/py2/multi_obj_pose_estimation/cfg/ape_occlusion.data
@ -0,0 +1,5 @@
+valid  = ../LINEMOD/ape/test_occlusion.txt
+mesh = ../LINEMOD/ape/ape.ply
+backup = backup_multi
+name = ape
+diam = 0.103
--- a/py2/multi_obj_pose_estimation/cfg/benchvise.data
+++ b/py2/multi_obj_pose_estimation/cfg/benchvise.data
@ -0,0 +1,7 @@
+train  = ../LINEMOD/benchvise/train.txt
+valid  = ../LINEMOD/benchvise/test.txt
+backup = backup_multi
+mesh = ../LINEMOD/benchvise/benchvise.ply
+tr_range = ../LINEMOD/benchvise/training_range.txt
+name = benchvise
+diam = 0.286908
--- a/py2/multi_obj_pose_estimation/cfg/can_occlusion.data
+++ b/py2/multi_obj_pose_estimation/cfg/can_occlusion.data
@ -0,0 +1,5 @@
+valid  = ../LINEMOD/can/test_occlusion.txt
+mesh = ../LINEMOD/can/can.ply
+backup = backup_multi
+name = can
+diam = 0.202
--- a/py2/multi_obj_pose_estimation/cfg/cat_occlusion.data
+++ b/py2/multi_obj_pose_estimation/cfg/cat_occlusion.data
@ -0,0 +1,5 @@
+valid  = ../LINEMOD/cat/test_occlusion.txt
+mesh = ../LINEMOD/cat/cat.ply
+backup = backup_multi
+name = cat
+diam = 0.155
--- a/py2/multi_obj_pose_estimation/cfg/driller_occlusion.data
+++ b/py2/multi_obj_pose_estimation/cfg/driller_occlusion.data
@ -0,0 +1,5 @@
+valid  = ../LINEMOD/driller/test_occlusion.txt
+mesh = ../LINEMOD/driller/driller.ply
+backup = backup_multi
+name = driller
+diam = 0.262
--- a/py2/multi_obj_pose_estimation/cfg/duck_occlusion.data
+++ b/py2/multi_obj_pose_estimation/cfg/duck_occlusion.data
@ -0,0 +1,5 @@
+valid  = ../LINEMOD/duck/test_occlusion.txt
+mesh = ../LINEMOD/duck/duck.ply
+backup = backup_multi
+name = duck
+diam = 0.109
--- a/py2/multi_obj_pose_estimation/cfg/eggbox_occlusion.data
+++ b/py2/multi_obj_pose_estimation/cfg/eggbox_occlusion.data
@ -0,0 +1,5 @@
+valid  = ../LINEMOD/eggbox/test_occlusion.txt
+mesh = ../LINEMOD/eggbox/eggbox.ply
+backup = backup_multi
+name = eggbox
+diam = 0.176364
--- a/py2/multi_obj_pose_estimation/cfg/glue_occlusion.data
+++ b/py2/multi_obj_pose_estimation/cfg/glue_occlusion.data
@ -0,0 +1,5 @@
+valid  = ../LINEMOD/glue/test_occlusion.txt
+mesh = ../LINEMOD/glue/glue.ply
+backup = backup_multi
+name = glue
+diam = 0.176
--- a/py2/multi_obj_pose_estimation/cfg/holepuncher_occlusion.data
+++ b/py2/multi_obj_pose_estimation/cfg/holepuncher_occlusion.data
@ -0,0 +1,5 @@
+valid  = ../LINEMOD/holepuncher/test_occlusion.txt
+mesh = ../LINEMOD/holepuncher/holepuncher.ply
+backup = backup_multi
+name = holepuncher
+diam = 0.162
--- a/py2/multi_obj_pose_estimation/cfg/occlusion.data
+++ b/py2/multi_obj_pose_estimation/cfg/occlusion.data
@ -0,0 +1,23 @@
+train  = cfg/train_occlusion.txt
+valid1  = ../LINEMOD/ape/test_occlusion.txt
+valid4  = ../LINEMOD/can/test_occlusion.txt
+valid5  = ../LINEMOD/cat/test_occlusion.txt
+valid6  = ../LINEMOD/driller/test_occlusion.txt
+valid7  = ../LINEMOD/duck/test_occlusion.txt
+valid9  = ../LINEMOD/glue/test_occlusion.txt
+valid10  =  ../LINEMOD/holepuncher/test_occlusion.txt
+backup = backup_multi
+mesh1 = ../LINEMOD/ape/ape.ply
+mesh4 = ../LINEMOD/can/can.ply
+mesh5 = ../LINEMOD/cat/cat.ply
+mesh6 = ../LINEMOD/driller/driller.ply
+mesh7 = ../LINEMOD/duck/duck.ply
+mesh9 = ../LINEMOD/glue/glue.ply
+mesh10 = ../LINEMOD/holepuncher/holepuncher.ply
+diam1 = 0.103
+diam4 = 0.202
+diam5 = 0.155
+diam6 = 0.262
+diam7 = 0.109
+diam9 = 0.176
+diam10 = 0.162
--- a/py2/multi_obj_pose_estimation/cfg/train_occlusion.txt
+++ b/py2/multi_obj_pose_estimation/cfg/train_occlusion.txt
@ -0,0 +1,183 @@
+../LINEMOD/benchvise/JPEGImages/000024.jpg
+../LINEMOD/benchvise/JPEGImages/000030.jpg
+../LINEMOD/benchvise/JPEGImages/000045.jpg
+../LINEMOD/benchvise/JPEGImages/000053.jpg
+../LINEMOD/benchvise/JPEGImages/000063.jpg
+../LINEMOD/benchvise/JPEGImages/000065.jpg
+../LINEMOD/benchvise/JPEGImages/000071.jpg
+../LINEMOD/benchvise/JPEGImages/000072.jpg
+../LINEMOD/benchvise/JPEGImages/000076.jpg
+../LINEMOD/benchvise/JPEGImages/000078.jpg
+../LINEMOD/benchvise/JPEGImages/000091.jpg
+../LINEMOD/benchvise/JPEGImages/000092.jpg
+../LINEMOD/benchvise/JPEGImages/000095.jpg
+../LINEMOD/benchvise/JPEGImages/000099.jpg
+../LINEMOD/benchvise/JPEGImages/000103.jpg
+../LINEMOD/benchvise/JPEGImages/000106.jpg
+../LINEMOD/benchvise/JPEGImages/000116.jpg
+../LINEMOD/benchvise/JPEGImages/000123.jpg
+../LINEMOD/benchvise/JPEGImages/000130.jpg
+../LINEMOD/benchvise/JPEGImages/000134.jpg
+../LINEMOD/benchvise/JPEGImages/000139.jpg
+../LINEMOD/benchvise/JPEGImages/000146.jpg
+../LINEMOD/benchvise/JPEGImages/000152.jpg
+../LINEMOD/benchvise/JPEGImages/000153.jpg
+../LINEMOD/benchvise/JPEGImages/000155.jpg
+../LINEMOD/benchvise/JPEGImages/000157.jpg
+../LINEMOD/benchvise/JPEGImages/000158.jpg
+../LINEMOD/benchvise/JPEGImages/000161.jpg
+../LINEMOD/benchvise/JPEGImages/000163.jpg
+../LINEMOD/benchvise/JPEGImages/000167.jpg
+../LINEMOD/benchvise/JPEGImages/000172.jpg
+../LINEMOD/benchvise/JPEGImages/000174.jpg
+../LINEMOD/benchvise/JPEGImages/000183.jpg
+../LINEMOD/benchvise/JPEGImages/000200.jpg
+../LINEMOD/benchvise/JPEGImages/000214.jpg
+../LINEMOD/benchvise/JPEGImages/000221.jpg
+../LINEMOD/benchvise/JPEGImages/000226.jpg
+../LINEMOD/benchvise/JPEGImages/000235.jpg
+../LINEMOD/benchvise/JPEGImages/000239.jpg
+../LINEMOD/benchvise/JPEGImages/000243.jpg
+../LINEMOD/benchvise/JPEGImages/000271.jpg
+../LINEMOD/benchvise/JPEGImages/000274.jpg
+../LINEMOD/benchvise/JPEGImages/000277.jpg
+../LINEMOD/benchvise/JPEGImages/000286.jpg
+../LINEMOD/benchvise/JPEGImages/000291.jpg
+../LINEMOD/benchvise/JPEGImages/000294.jpg
+../LINEMOD/benchvise/JPEGImages/000302.jpg
+../LINEMOD/benchvise/JPEGImages/000307.jpg
+../LINEMOD/benchvise/JPEGImages/000314.jpg
+../LINEMOD/benchvise/JPEGImages/000320.jpg
+../LINEMOD/benchvise/JPEGImages/000324.jpg
+../LINEMOD/benchvise/JPEGImages/000347.jpg
+../LINEMOD/benchvise/JPEGImages/000350.jpg
+../LINEMOD/benchvise/JPEGImages/000355.jpg
+../LINEMOD/benchvise/JPEGImages/000364.jpg
+../LINEMOD/benchvise/JPEGImages/000367.jpg
+../LINEMOD/benchvise/JPEGImages/000369.jpg
+../LINEMOD/benchvise/JPEGImages/000376.jpg
+../LINEMOD/benchvise/JPEGImages/000377.jpg
+../LINEMOD/benchvise/JPEGImages/000379.jpg
+../LINEMOD/benchvise/JPEGImages/000383.jpg
+../LINEMOD/benchvise/JPEGImages/000384.jpg
+../LINEMOD/benchvise/JPEGImages/000387.jpg
+../LINEMOD/benchvise/JPEGImages/000394.jpg
+../LINEMOD/benchvise/JPEGImages/000402.jpg
+../LINEMOD/benchvise/JPEGImages/000406.jpg
+../LINEMOD/benchvise/JPEGImages/000410.jpg
+../LINEMOD/benchvise/JPEGImages/000413.jpg
+../LINEMOD/benchvise/JPEGImages/000422.jpg
+../LINEMOD/benchvise/JPEGImages/000425.jpg
+../LINEMOD/benchvise/JPEGImages/000430.jpg
+../LINEMOD/benchvise/JPEGImages/000434.jpg
+../LINEMOD/benchvise/JPEGImages/000441.jpg
+../LINEMOD/benchvise/JPEGImages/000446.jpg
+../LINEMOD/benchvise/JPEGImages/000451.jpg
+../LINEMOD/benchvise/JPEGImages/000456.jpg
+../LINEMOD/benchvise/JPEGImages/000461.jpg
+../LINEMOD/benchvise/JPEGImages/000465.jpg
+../LINEMOD/benchvise/JPEGImages/000471.jpg
+../LINEMOD/benchvise/JPEGImages/000480.jpg
+../LINEMOD/benchvise/JPEGImages/000483.jpg
+../LINEMOD/benchvise/JPEGImages/000493.jpg
+../LINEMOD/benchvise/JPEGImages/000496.jpg
+../LINEMOD/benchvise/JPEGImages/000498.jpg
+../LINEMOD/benchvise/JPEGImages/000507.jpg
+../LINEMOD/benchvise/JPEGImages/000512.jpg
+../LINEMOD/benchvise/JPEGImages/000525.jpg
+../LINEMOD/benchvise/JPEGImages/000527.jpg
+../LINEMOD/benchvise/JPEGImages/000532.jpg
+../LINEMOD/benchvise/JPEGImages/000533.jpg
+../LINEMOD/benchvise/JPEGImages/000534.jpg
+../LINEMOD/benchvise/JPEGImages/000539.jpg
+../LINEMOD/benchvise/JPEGImages/000554.jpg
+../LINEMOD/benchvise/JPEGImages/000556.jpg
+../LINEMOD/benchvise/JPEGImages/000568.jpg
+../LINEMOD/benchvise/JPEGImages/000571.jpg
+../LINEMOD/benchvise/JPEGImages/000573.jpg
+../LINEMOD/benchvise/JPEGImages/000576.jpg
+../LINEMOD/benchvise/JPEGImages/000598.jpg
+../LINEMOD/benchvise/JPEGImages/000603.jpg
+../LINEMOD/benchvise/JPEGImages/000604.jpg
+../LINEMOD/benchvise/JPEGImages/000609.jpg
+../LINEMOD/benchvise/JPEGImages/000627.jpg
+../LINEMOD/benchvise/JPEGImages/000635.jpg
+../LINEMOD/benchvise/JPEGImages/000641.jpg
+../LINEMOD/benchvise/JPEGImages/000649.jpg
+../LINEMOD/benchvise/JPEGImages/000653.jpg
+../LINEMOD/benchvise/JPEGImages/000656.jpg
+../LINEMOD/benchvise/JPEGImages/000659.jpg
+../LINEMOD/benchvise/JPEGImages/000668.jpg
+../LINEMOD/benchvise/JPEGImages/000676.jpg
+../LINEMOD/benchvise/JPEGImages/000692.jpg
+../LINEMOD/benchvise/JPEGImages/000697.jpg
+../LINEMOD/benchvise/JPEGImages/000706.jpg
+../LINEMOD/benchvise/JPEGImages/000715.jpg
+../LINEMOD/benchvise/JPEGImages/000717.jpg
+../LINEMOD/benchvise/JPEGImages/000726.jpg
+../LINEMOD/benchvise/JPEGImages/000735.jpg
+../LINEMOD/benchvise/JPEGImages/000744.jpg
+../LINEMOD/benchvise/JPEGImages/000747.jpg
+../LINEMOD/benchvise/JPEGImages/000752.jpg
+../LINEMOD/benchvise/JPEGImages/000758.jpg
+../LINEMOD/benchvise/JPEGImages/000760.jpg
+../LINEMOD/benchvise/JPEGImages/000772.jpg
+../LINEMOD/benchvise/JPEGImages/000775.jpg
+../LINEMOD/benchvise/JPEGImages/000780.jpg
+../LINEMOD/benchvise/JPEGImages/000785.jpg
+../LINEMOD/benchvise/JPEGImages/000800.jpg
+../LINEMOD/benchvise/JPEGImages/000802.jpg
+../LINEMOD/benchvise/JPEGImages/000828.jpg
+../LINEMOD/benchvise/JPEGImages/000837.jpg
+../LINEMOD/benchvise/JPEGImages/000842.jpg
+../LINEMOD/benchvise/JPEGImages/000845.jpg
+../LINEMOD/benchvise/JPEGImages/000847.jpg
+../LINEMOD/benchvise/JPEGImages/000850.jpg
+../LINEMOD/benchvise/JPEGImages/000859.jpg
+../LINEMOD/benchvise/JPEGImages/000875.jpg
+../LINEMOD/benchvise/JPEGImages/000880.jpg
+../LINEMOD/benchvise/JPEGImages/000883.jpg
+../LINEMOD/benchvise/JPEGImages/000891.jpg
+../LINEMOD/benchvise/JPEGImages/000892.jpg
+../LINEMOD/benchvise/JPEGImages/000915.jpg
+../LINEMOD/benchvise/JPEGImages/000916.jpg
+../LINEMOD/benchvise/JPEGImages/000923.jpg
+../LINEMOD/benchvise/JPEGImages/000931.jpg
+../LINEMOD/benchvise/JPEGImages/000933.jpg
+../LINEMOD/benchvise/JPEGImages/000941.jpg
+../LINEMOD/benchvise/JPEGImages/000945.jpg
+../LINEMOD/benchvise/JPEGImages/000954.jpg
+../LINEMOD/benchvise/JPEGImages/000959.jpg
+../LINEMOD/benchvise/JPEGImages/000964.jpg
+../LINEMOD/benchvise/JPEGImages/000975.jpg
+../LINEMOD/benchvise/JPEGImages/000987.jpg
+../LINEMOD/benchvise/JPEGImages/001002.jpg
+../LINEMOD/benchvise/JPEGImages/001014.jpg
+../LINEMOD/benchvise/JPEGImages/001020.jpg
+../LINEMOD/benchvise/JPEGImages/001024.jpg
+../LINEMOD/benchvise/JPEGImages/001038.jpg
+../LINEMOD/benchvise/JPEGImages/001040.jpg
+../LINEMOD/benchvise/JPEGImages/001048.jpg
+../LINEMOD/benchvise/JPEGImages/001066.jpg
+../LINEMOD/benchvise/JPEGImages/001071.jpg
+../LINEMOD/benchvise/JPEGImages/001081.jpg
+../LINEMOD/benchvise/JPEGImages/001084.jpg
+../LINEMOD/benchvise/JPEGImages/001088.jpg
+../LINEMOD/benchvise/JPEGImages/001102.jpg
+../LINEMOD/benchvise/JPEGImages/001103.jpg
+../LINEMOD/benchvise/JPEGImages/001106.jpg
+../LINEMOD/benchvise/JPEGImages/001112.jpg
+../LINEMOD/benchvise/JPEGImages/001121.jpg
+../LINEMOD/benchvise/JPEGImages/001129.jpg
+../LINEMOD/benchvise/JPEGImages/001133.jpg
+../LINEMOD/benchvise/JPEGImages/001135.jpg
+../LINEMOD/benchvise/JPEGImages/001136.jpg
+../LINEMOD/benchvise/JPEGImages/001157.jpg
+../LINEMOD/benchvise/JPEGImages/001159.jpg
+../LINEMOD/benchvise/JPEGImages/001163.jpg
+../LINEMOD/benchvise/JPEGImages/001171.jpg
+../LINEMOD/benchvise/JPEGImages/001172.jpg
+../LINEMOD/benchvise/JPEGImages/001174.jpg
+../LINEMOD/benchvise/JPEGImages/001191.jpg
+../LINEMOD/benchvise/JPEGImages/001198.jpg
+../LINEMOD/benchvise/JPEGImages/001205.jpg
--- a/py2/multi_obj_pose_estimation/cfg/yolo-pose-multi-pre.cfg
+++ b/py2/multi_obj_pose_estimation/cfg/yolo-pose-multi-pre.cfg
@ -0,0 +1,261 @@
+[net]
+# Testing
+batch=64
+subdivisions=8
+# Training
+# batch=64
+# subdivisions=8
+height=416
+width=416
+channels=3
+momentum=0.9
+decay=0.0005
+angle=0
+saturation = 1.5
+exposure = 1.5
+hue=.1
+
+learning_rate=0.001
+burn_in=1000
+max_batches = 80200
+policy=steps
+steps=-1,500,20000,30000
+# steps=-1,180,360,540
+scales=0.1,10,.1,.1
+
+[convolutional]
+batch_normalize=1
+filters=32
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=64
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=128
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=64
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=128
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=128
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=1024
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=1024
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=1024
+size=3
+stride=1
+pad=1
+activation=leaky
+
+
+#######
+
+[convolutional]
+batch_normalize=1
+size=3
+stride=1
+pad=1
+filters=1024
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+size=3
+stride=1
+pad=1
+filters=1024
+activation=leaky
+
+[route]
+layers=-9
+
+[convolutional]
+batch_normalize=1
+size=1
+stride=1
+pad=1
+filters=64
+activation=leaky
+
+[reorg]
+stride=2
+
+[route]
+layers=-1,-4
+
+[convolutional]
+batch_normalize=1
+size=3
+stride=1
+pad=1
+filters=1024
+activation=leaky
+
+[convolutional]
+size=1
+stride=1
+pad=1
+# filters=125
+filters=160
+activation=linear
+
+
+[region]
+# anchors =  1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071
+anchors = 1.4820, 2.2412, 2.0501, 3.1265, 2.3946, 4.6891, 3.1018, 3.9910, 3.4879, 5.8851
+bias_match=1
+classes=13
+coords=18
+num=5
+softmax=1
+jitter=.3
+rescore=1
+
+object_scale=0
+noobject_scale=0
+class_scale=1
+coord_scale=1
+
+absolute=1
+thresh = .6
+random=1
--- a/py2/multi_obj_pose_estimation/cfg/yolo-pose-multi.cfg
+++ b/py2/multi_obj_pose_estimation/cfg/yolo-pose-multi.cfg
@ -0,0 +1,261 @@
+[net]
+# Testing
+batch=32
+subdivisions=8
+# Training
+# batch=64
+# subdivisions=8
+height=416
+width=416
+channels=3
+momentum=0.9
+decay=0.0005
+angle=0
+saturation = 1.5
+exposure = 1.5
+hue=.1
+
+learning_rate=0.001
+burn_in=1000
+max_batches = 80200
+policy=steps
+steps=-1,100,20000,30000
+# steps=-1,180,360,540
+scales=0.1,10,.1,.1
+
+[convolutional]
+batch_normalize=1
+filters=32
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=64
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=128
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=64
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=128
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=128
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=256
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[maxpool]
+size=2
+stride=2
+
+[convolutional]
+batch_normalize=1
+filters=1024
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=1024
+size=3
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=512
+size=1
+stride=1
+pad=1
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+filters=1024
+size=3
+stride=1
+pad=1
+activation=leaky
+
+
+#######
+
+[convolutional]
+batch_normalize=1
+size=3
+stride=1
+pad=1
+filters=1024
+activation=leaky
+
+[convolutional]
+batch_normalize=1
+size=3
+stride=1
+pad=1
+filters=1024
+activation=leaky
+
+[route]
+layers=-9
+
+[convolutional]
+batch_normalize=1
+size=1
+stride=1
+pad=1
+filters=64
+activation=leaky
+
+[reorg]
+stride=2
+
+[route]
+layers=-1,-4
+
+[convolutional]
+batch_normalize=1
+size=3
+stride=1
+pad=1
+filters=1024
+activation=leaky
+
+[convolutional]
+size=1
+stride=1
+pad=1
+# filters=125
+filters=160
+activation=linear
+
+
+[region]
+# anchors =  1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071
+anchors = 1.4820, 2.2412, 2.0501, 3.1265, 2.3946, 4.6891, 3.1018, 3.9910, 3.4879, 5.8851
+bias_match=1
+classes=13
+coords=18
+num=5
+softmax=1
+jitter=.3
+rescore=1
+
+object_scale=5
+noobject_scale=0.1
+class_scale=1
+coord_scale=1
+
+absolute=1
+thresh = .6
+random=1
--- a/py2/multi_obj_pose_estimation/darknet_multi.py
+++ b/py2/multi_obj_pose_estimation/darknet_multi.py
@ -0,0 +1,388 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import numpy as np
+from region_loss_multi import RegionLoss
+from cfg import *
+
+class MaxPoolStride1(nn.Module):
+    def __init__(self):
+        super(MaxPoolStride1, self).__init__()
+
+    def forward(self, x):
+        x = F.max_pool2d(F.pad(x, (0,1,0,1), mode='replicate'), 2, stride=1)
+        return x
+
+class Reorg(nn.Module):
+    def __init__(self, stride=2):
+        super(Reorg, self).__init__()
+        self.stride = stride
+    def forward(self, x):
+        stride = self.stride
+        assert(x.data.dim() == 4)
+        B = x.data.size(0)
+        C = x.data.size(1)
+        H = x.data.size(2)
+        W = x.data.size(3)
+        assert(H % stride == 0)
+        assert(W % stride == 0)
+        ws = stride
+        hs = stride
+        x = x.view(B, C, H/hs, hs, W/ws, ws).transpose(3,4).contiguous()
+        x = x.view(B, C, H/hs*W/ws, hs*ws).transpose(2,3).contiguous()
+        x = x.view(B, C, hs*ws, H/hs, W/ws).transpose(1,2).contiguous()
+        x = x.view(B, hs*ws*C, H/hs, W/ws)
+        return x
+
+class GlobalAvgPool2d(nn.Module):
+    def __init__(self):
+        super(GlobalAvgPool2d, self).__init__()
+
+    def forward(self, x):
+        N = x.data.size(0)
+        C = x.data.size(1)
+        H = x.data.size(2)
+        W = x.data.size(3)
+        x = F.avg_pool2d(x, (H, W))
+        x = x.view(N, C)
+        return x
+
+# for route and shortcut
+class EmptyModule(nn.Module):
+    def __init__(self):
+        super(EmptyModule, self).__init__()
+
+    def forward(self, x):
+        return x
+
+# support route shortcut and reorg
+class Darknet(nn.Module):
+    def __init__(self, cfgfile):
+        super(Darknet, self).__init__()
+        self.blocks = parse_cfg(cfgfile)
+        self.models = self.create_network(self.blocks) # merge conv, bn,leaky
+        self.loss = self.models[len(self.models)-1]
+
+        self.width = int(self.blocks[0]['width'])
+        self.height = int(self.blocks[0]['height'])
+
+        if self.blocks[(len(self.blocks)-1)]['type'] == 'region':
+            self.anchors = self.loss.anchors
+            self.num_anchors = self.loss.num_anchors
+            self.anchor_step = self.loss.anchor_step
+            self.num_classes = self.loss.num_classes
+
+        self.header = torch.IntTensor([0,0,0,0])
+        self.seen = 0
+        self.iter = 0
+
+    def forward(self, x):
+        ind = -2
+        self.loss = None
+        outputs = dict()
+        for block in self.blocks:
+            ind = ind + 1
+            #if ind > 0:
+            #    return x
+
+            if block['type'] == 'net':
+                continue
+            elif block['type'] == 'convolutional' or block['type'] == 'maxpool' or block['type'] == 'reorg' or block['type'] == 'avgpool' or block['type'] == 'softmax' or block['type'] == 'connected':
+                x = self.models[ind](x)
+                outputs[ind] = x
+            elif block['type'] == 'route':
+                layers = block['layers'].split(',')
+                layers = [int(i) if int(i) > 0 else int(i)+ind for i in layers]
+                if len(layers) == 1:
+                    x = outputs[layers[0]]
+                    outputs[ind] = x
+                elif len(layers) == 2:
+                    x1 = outputs[layers[0]]
+                    x2 = outputs[layers[1]]
+                    x = torch.cat((x1,x2),1)
+                    outputs[ind] = x
+            elif block['type'] == 'shortcut':
+                from_layer = int(block['from'])
+                activation = block['activation']
+                from_layer = from_layer if from_layer > 0 else from_layer + ind
+                x1 = outputs[from_layer]
+                x2 = outputs[ind-1]
+                x  = x1 + x2
+                if activation == 'leaky':
+                    x = F.leaky_relu(x, 0.1, inplace=True)
+                elif activation == 'relu':
+                    x = F.relu(x, inplace=True)
+                outputs[ind] = x
+            elif block['type'] == 'region':
+                continue
+                if self.loss:
+                    self.loss = self.loss + self.models[ind](x)
+                else:
+                    self.loss = self.models[ind](x)
+                outputs[ind] = None
+            elif block['type'] == 'cost':
+                continue
+            else:
+                print('unknown type %s' % (block['type']))
+        return x
+
+    def print_network(self):
+        print_cfg(self.blocks)
+
+    def create_network(self, blocks):
+        models = nn.ModuleList()
+    
+        prev_filters = 3
+        out_filters =[]
+        conv_id = 0
+        for block in blocks:
+            if block['type'] == 'net':
+                prev_filters = int(block['channels'])
+                continue
+            elif block['type'] == 'convolutional':
+                conv_id = conv_id + 1
+                batch_normalize = int(block['batch_normalize'])
+                filters = int(block['filters'])
+                kernel_size = int(block['size'])
+                stride = int(block['stride'])
+                is_pad = int(block['pad'])
+                pad = (kernel_size-1)/2 if is_pad else 0
+                activation = block['activation']
+                model = nn.Sequential()
+                if batch_normalize:
+                    model.add_module('conv{0}'.format(conv_id), nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias=False))
+                    model.add_module('bn{0}'.format(conv_id), nn.BatchNorm2d(filters, eps=1e-4))
+                    #model.add_module('bn{0}'.format(conv_id), BN2d(filters))
+                else:
+                    model.add_module('conv{0}'.format(conv_id), nn.Conv2d(prev_filters, filters, kernel_size, stride, pad))
+                if activation == 'leaky':
+                    model.add_module('leaky{0}'.format(conv_id), nn.LeakyReLU(0.1, inplace=True))
+                elif activation == 'relu':
+                    model.add_module('relu{0}'.format(conv_id), nn.ReLU(inplace=True))
+                prev_filters = filters
+                out_filters.append(prev_filters)
+                models.append(model)
+            elif block['type'] == 'maxpool':
+                pool_size = int(block['size'])
+                stride = int(block['stride'])
+                if stride > 1:
+                    model = nn.MaxPool2d(pool_size, stride)
+                else:
+                    model = MaxPoolStride1()
+                out_filters.append(prev_filters)
+                models.append(model)
+            elif block['type'] == 'avgpool':
+                model = GlobalAvgPool2d()
+                out_filters.append(prev_filters)
+                models.append(model)
+            elif block['type'] == 'softmax':
+                model = nn.Softmax()
+                out_filters.append(prev_filters)
+                models.append(model)
+            elif block['type'] == 'cost':
+                if block['_type'] == 'sse':
+                    model = nn.MSELoss(size_average=True)
+                elif block['_type'] == 'L1':
+                    model = nn.L1Loss(size_average=True)
+                elif block['_type'] == 'smooth':
+                    model = nn.SmoothL1Loss(size_average=True)
+                out_filters.append(1)
+                models.append(model)
+            elif block['type'] == 'reorg':
+                stride = int(block['stride'])
+                prev_filters = stride * stride * prev_filters
+                out_filters.append(prev_filters)
+                models.append(Reorg(stride))
+            elif block['type'] == 'route':
+                layers = block['layers'].split(',')
+                ind = len(models)
+                layers = [int(i) if int(i) > 0 else int(i)+ind for i in layers]
+                if len(layers) == 1:
+                    prev_filters = out_filters[layers[0]]
+                elif len(layers) == 2:
+                    assert(layers[0] == ind - 1)
+                    prev_filters = out_filters[layers[0]] + out_filters[layers[1]]
+                out_filters.append(prev_filters)
+                models.append(EmptyModule())
+            elif block['type'] == 'shortcut':
+                ind = len(models)
+                prev_filters = out_filters[ind-1]
+                out_filters.append(prev_filters)
+                models.append(EmptyModule())
+            elif block['type'] == 'connected':
+                filters = int(block['output'])
+                if block['activation'] == 'linear':
+                    model = nn.Linear(prev_filters, filters)
+                elif block['activation'] == 'leaky':
+                    model = nn.Sequential(
+                               nn.Linear(prev_filters, filters),
+                               nn.LeakyReLU(0.1, inplace=True))
+                elif block['activation'] == 'relu':
+                    model = nn.Sequential(
+                               nn.Linear(prev_filters, filters),
+                               nn.ReLU(inplace=True))
+                prev_filters = filters
+                out_filters.append(prev_filters)
+                models.append(model)
+            elif block['type'] == 'region':
+                loss = RegionLoss()
+                anchors = block['anchors'].split(',')
+                loss.anchors = [float(i) for i in anchors]
+                loss.num_classes = int(block['classes'])
+                loss.num_anchors = int(block['num'])
+                loss.anchor_step = len(loss.anchors)/loss.num_anchors
+                loss.object_scale = float(block['object_scale'])
+                loss.noobject_scale = float(block['noobject_scale'])
+                loss.class_scale = float(block['class_scale'])
+                loss.coord_scale = float(block['coord_scale'])
+                out_filters.append(prev_filters)
+                models.append(loss)
+            else:
+                print('unknown type %s' % (block['type']))
+    
+        return models
+
+    def load_weights(self, weightfile):
+        fp = open(weightfile, 'rb')
+        header = np.fromfile(fp, count=4, dtype=np.int32)
+        self.header = torch.from_numpy(header)
+        self.seen = self.header[3]
+        buf = np.fromfile(fp, dtype = np.float32)
+        fp.close()
+
+        start = 0
+        ind = -2
+        for block in self.blocks:
+            if start >= buf.size:
+                break
+            ind = ind + 1
+            if block['type'] == 'net':
+                continue
+            elif block['type'] == 'convolutional':
+                model = self.models[ind]
+                batch_normalize = int(block['batch_normalize'])
+                if batch_normalize:
+                    start = load_conv_bn(buf, start, model[0], model[1])
+                else:
+                    start = load_conv(buf, start, model[0])
+            elif block['type'] == 'connected':
+                model = self.models[ind]
+                if block['activation'] != 'linear':
+                    start = load_fc(buf, start, model[0])
+                else:
+                    start = load_fc(buf, start, model)
+            elif block['type'] == 'maxpool':
+                pass
+            elif block['type'] == 'reorg':
+                pass
+            elif block['type'] == 'route':
+                pass
+            elif block['type'] == 'shortcut':
+                pass
+            elif block['type'] == 'region':
+                pass
+            elif block['type'] == 'avgpool':
+                pass
+            elif block['type'] == 'softmax':
+                pass
+            elif block['type'] == 'cost':
+                pass
+            else:
+                print('unknown type %s' % (block['type']))
+
+    def load_weights_until_last(self, weightfile):
+        fp = open(weightfile, 'rb')
+        header = np.fromfile(fp, count=4, dtype=np.int32)
+        self.header = torch.from_numpy(header)
+        self.seen = self.header[3]
+        buf = np.fromfile(fp, dtype = np.float32)
+        fp.close()
+
+        start = 0
+        ind = -2
+        blocklen = len(self.blocks)
+        for i in range(blocklen-2):
+            block = self.blocks[i]
+            if start >= buf.size:
+                break
+            ind = ind + 1
+            if block['type'] == 'net':
+                continue
+            elif block['type'] == 'convolutional':
+                model = self.models[ind]
+                batch_normalize = int(block['batch_normalize'])
+                if batch_normalize:
+                    start = load_conv_bn(buf, start, model[0], model[1])
+                else:
+                    start = load_conv(buf, start, model[0])
+            elif block['type'] == 'connected':
+                model = self.models[ind]
+                if block['activation'] != 'linear':
+                    start = load_fc(buf, start, model[0])
+                else:
+                    start = load_fc(buf, start, model)
+            elif block['type'] == 'maxpool':
+                pass
+            elif block['type'] == 'reorg':
+                pass
+            elif block['type'] == 'route':
+                pass
+            elif block['type'] == 'shortcut':
+                pass
+            elif block['type'] == 'region':
+                pass
+            elif block['type'] == 'avgpool':
+                pass
+            elif block['type'] == 'softmax':
+                pass
+            elif block['type'] == 'cost':
+                pass
+            else:
+                print('unknown type %s' % (block['type']))
+
+
+    def save_weights(self, outfile, cutoff=0):
+        if cutoff <= 0:
+            cutoff = len(self.blocks)-1
+
+        fp = open(outfile, 'wb')
+        self.header[3] = self.seen
+        header = self.header
+        header.numpy().tofile(fp)
+
+        ind = -1
+        for blockId in range(1, cutoff+1):
+            ind = ind + 1
+            block = self.blocks[blockId]
+            if block['type'] == 'convolutional':
+                model = self.models[ind]
+                batch_normalize = int(block['batch_normalize'])
+                if batch_normalize:
+                    save_conv_bn(fp, model[0], model[1])
+                else:
+                    save_conv(fp, model[0])
+            elif block['type'] == 'connected':
+                model = self.models[ind]
+                if block['activation'] != 'linear':
+                    save_fc(fc, model)
+                else:
+                    save_fc(fc, model[0])
+            elif block['type'] == 'maxpool':
+                pass
+            elif block['type'] == 'reorg':
+                pass
+            elif block['type'] == 'route':
+                pass
+            elif block['type'] == 'shortcut':
+                pass
+            elif block['type'] == 'region':
+                pass
+            elif block['type'] == 'avgpool':
+                pass
+            elif block['type'] == 'softmax':
+                pass
+            elif block['type'] == 'cost':
+                pass
+            else:
+                print('unknown type %s' % (block['type']))
+        fp.close()
--- a/py2/multi_obj_pose_estimation/dataset_multi.py
+++ b/py2/multi_obj_pose_estimation/dataset_multi.py
@ -0,0 +1,94 @@
+#!/usr/bin/python
+# encoding: utf-8
+
+import os
+import random
+import torch
+import numpy as np
+from PIL import Image
+from torch.utils.data import Dataset
+
+from utils import read_truths_args, read_truths, get_all_files
+from image_multi import *
+
+class listDataset(Dataset):
+
+    def __init__(self, root, shape=None, shuffle=True, transform=None, objclass=None, target_transform=None, train=False, seen=0, batch_size=64, num_workers=4, bg_file_names=None): # bg='/cvlabdata1/home/btekin/ope/data/office_bg'
+       with open(root, 'r') as file:
+           self.lines = file.readlines()
+       if shuffle:
+           random.shuffle(self.lines)
+       self.nSamples         = len(self.lines)
+       self.transform        = transform
+       self.target_transform = target_transform
+       self.train            = train
+       self.shape            = shape
+       self.seen             = seen
+       self.batch_size       = batch_size
+       self.num_workers      = num_workers
+       # self.bg_file_names    = get_all_files(bg)
+       self.bg_file_names    = bg_file_names
+       self.objclass         = objclass
+
+    def __len__(self):
+        return self.nSamples
+
+    def __getitem__(self, index):
+        assert index <= len(self), 'index range error'
+        imgpath = self.lines[index].rstrip()
+
+        if self.train and index % 64== 0:
+            if self.seen < 4000*64:
+               width = 13*32
+               self.shape = (width, width)
+            elif self.seen < 8000*64:
+               width = (random.randint(0,3) + 13)*32
+               self.shape = (width, width)
+            elif self.seen < 12000*64:
+               width = (random.randint(0,5) + 12)*32
+               self.shape = (width, width)
+            elif self.seen < 16000*64:
+               width = (random.randint(0,7) + 11)*32
+               self.shape = (width, width)
+            else: # self.seen < 20000*64:
+               width = (random.randint(0,9) + 10)*32
+               self.shape = (width, width)
+
+        if self.train:
+            # jitter = 0.2
+            jitter = 0.1
+            hue = 0.05
+            saturation = 1.5 
+            exposure = 1.5
+
+            # Get background image path
+            random_bg_index = random.randint(0, len(self.bg_file_names) - 1)
+            bgpath = self.bg_file_names[random_bg_index]
+
+            img, label = load_data_detection(imgpath, self.shape, jitter, hue, saturation, exposure, bgpath)
+            label = torch.from_numpy(label)
+        else:
+            img = Image.open(imgpath).convert('RGB')
+            if self.shape:
+                img = img.resize(self.shape)
+            
+            labpath = imgpath.replace('benchvise', self.objclass).replace('images', 'labels_occlusion').replace('JPEGImages', 'labels_occlusion').replace('.jpg', '.txt').replace('.png','.txt')
+            label = torch.zeros(50*21)
+            if os.path.getsize(labpath):
+                ow, oh = img.size
+                tmp = torch.from_numpy(read_truths_args(labpath, 8.0/ow))
+                tmp = tmp.view(-1)
+                tsz = tmp.numel()
+                if tsz > 50*21:
+                    label = tmp[0:50*21]
+                elif tsz > 0:
+                    label[0:tsz] = tmp
+
+        if self.transform is not None:
+            img = self.transform(img)
+
+        if self.target_transform is not None:
+            label = self.target_transform(label)
+
+        self.seen = self.seen + self.num_workers
+        return (img, label)
--- a/py2/multi_obj_pose_estimation/image_multi.py
+++ b/py2/multi_obj_pose_estimation/image_multi.py
@ -0,0 +1,450 @@
+#!/usr/bin/python
+# encoding: utf-8
+import random
+import os
+from PIL import Image, ImageChops, ImageMath
+import numpy as np
+
+def load_data_detection_backup(imgpath, shape, jitter, hue, saturation, exposure, bgpath):
+    labpath = imgpath.replace('images', 'labels').replace('JPEGImages', 'labels').replace('.jpg', '.txt').replace('.png','.txt')
+    maskpath = imgpath.replace('JPEGImages', 'mask').replace('/00', '/').replace('.jpg', '.png')
+
+    ## data augmentation
+    img = Image.open(imgpath).convert('RGB')
+    mask = Image.open(maskpath).convert('RGB')
+    bg = Image.open(bgpath).convert('RGB')
+    
+    img = change_background(img, mask, bg)
+    img,flip,dx,dy,sx,sy = data_augmentation(img, shape, jitter, hue, saturation, exposure)
+    label = fill_truth_detection(labpath, img.width, img.height, flip, dx, dy, 1./sx, 1./sy)
+    return img,label
+
+def get_add_objs(objname):
+    # Decide how many additional objects you will augment and what will be the other types of objects
+    if objname == 'ape':
+        add_objs = ['can', 'cat', 'duck', 'glue', 'holepuncher', 'iron', 'phone'] # eggbox
+    elif objname == 'benchvise':
+        add_objs = ['ape', 'can', 'cat', 'driller', 'duck', 'glue', 'holepuncher']
+    elif objname == 'cam':
+        add_objs = ['ape', 'benchvise', 'can', 'cat', 'driller', 'duck', 'holepuncher']
+    elif objname == 'can':
+        add_objs = ['ape', 'benchvise', 'cat', 'driller', 'duck', 'eggbox', 'holepuncher']
+    elif objname == 'cat':
+        add_objs = ['ape', 'can', 'duck', 'glue', 'holepuncher', 'eggbox', 'phone']
+    elif objname == 'driller':
+        add_objs = ['ape', 'benchvise', 'can', 'cat', 'duck', 'glue', 'holepuncher']
+    elif objname == 'duck':
+        add_objs = ['ape', 'can', 'cat', 'eggbox', 'glue', 'holepuncher', 'phone']
+    elif objname == 'eggbox':
+        add_objs = ['ape', 'benchvise', 'cam', 'can', 'cat', 'duck', 'glue', 'holepuncher']
+    elif objname == 'glue':
+        add_objs = ['ape', 'benchvise', 'cam', 'driller', 'duck', 'eggbox', 'holepuncher' ]
+    elif objname == 'holepuncher':
+        add_objs = ['benchvise', 'cam', 'can', 'cat', 'driller', 'duck', 'eggbox']
+    elif objname == 'iron':
+        add_objs = ['ape', 'benchvise', 'can', 'cat', 'driller', 'duck', 'glue']
+    elif objname == 'lamp':
+        add_objs = ['ape', 'benchvise', 'can', 'driller', 'eggbox', 'holepuncher', 'iron']
+    elif objname == 'phone':
+        add_objs = ['ape', 'benchvise', 'cam', 'can', 'driller', 'duck', 'holepuncher']
+    return add_objs
+
+def mask_background(img, mask):
+    ow, oh = img.size
+    
+    imcs = list(img.split())
+    maskcs = list(mask.split())
+    fics = list(Image.new(img.mode, img.size).split())
+    
+    for c in range(len(imcs)):
+        negmask = maskcs[c].point(lambda i: 1 - i / 255)
+        posmask = maskcs[c].point(lambda i: i / 255)
+        fics[c] = ImageMath.eval("a * c", a=imcs[c], c=posmask).convert('L')
+    out = Image.merge(img.mode, tuple(fics))
+    return out
+
+
+def scale_image_channel(im, c, v):
+    cs = list(im.split())
+    cs[c] = cs[c].point(lambda i: i * v)
+    out = Image.merge(im.mode, tuple(cs))
+    return out
+
+def distort_image(im, hue, sat, val):
+    im = im.convert('HSV')
+    cs = list(im.split())
+    cs[1] = cs[1].point(lambda i: i * sat)
+    cs[2] = cs[2].point(lambda i: i * val)
+    
+    def change_hue(x):
+        x += hue*255
+        if x > 255:
+            x -= 255
+        if x < 0:
+            x += 255
+        return x
+    cs[0] = cs[0].point(change_hue)
+    im = Image.merge(im.mode, tuple(cs))
+
+    im = im.convert('RGB')
+    #constrain_image(im)
+    return im
+
+def rand_scale(s):
+    scale = random.uniform(1, s)
+    if(random.randint(1,10000)%2): 
+        return scale
+    return 1./scale
+
+def random_distort_image(im, hue, saturation, exposure):
+    dhue = random.uniform(-hue, hue)
+    dsat = rand_scale(saturation)
+    dexp = rand_scale(exposure)
+    res  = distort_image(im, dhue, dsat, dexp)
+    return res
+
+def data_augmentation(img, shape, jitter, hue, saturation, exposure):
+    oh = img.height  
+    ow = img.width
+    
+    dw =int(ow*jitter)
+    dh =int(oh*jitter)
+
+    pleft  = random.randint(-dw, dw)
+    pright = random.randint(-dw, dw)
+    ptop   = random.randint(-dh, dh)
+    pbot   = random.randint(-dh, dh)
+
+    swidth =  ow - pleft - pright
+    sheight = oh - ptop - pbot
+
+    sx = float(swidth)  / ow
+    sy = float(sheight) / oh
+    
+    flip = random.randint(1,10000)%2
+    cropped = img.crop( (pleft, ptop, pleft + swidth - 1, ptop + sheight - 1))
+
+    dx = (float(pleft)/ow)/sx
+    dy = (float(ptop) /oh)/sy
+
+    sized = cropped.resize(shape)
+
+    if flip: 
+        sized = sized.transpose(Image.FLIP_LEFT_RIGHT)
+    img = random_distort_image(sized, hue, saturation, exposure)
+    
+    return img, flip, dx,dy,sx,sy 
+
+def fill_truth_detection(labpath, w, h, flip, dx, dy, sx, sy):
+    max_boxes = 50
+    label = np.zeros((max_boxes,21))
+    if os.path.getsize(labpath):
+        
+        bs = np.loadtxt(labpath)
+        if bs is None:
+            return label
+        bs = np.reshape(bs, (-1, 21))
+        cc = 0
+        for i in range(bs.shape[0]):
+            x0 = bs[i][1]
+            y0 = bs[i][2]
+            x1 = bs[i][3]
+            y1 = bs[i][4]
+            x2 = bs[i][5]
+            y2 = bs[i][6]
+            x3 = bs[i][7]
+            y3 = bs[i][8]
+            x4 = bs[i][9]
+            y4 = bs[i][10]
+            x5 = bs[i][11]
+            y5 = bs[i][12]
+            x6 = bs[i][13]
+            y6 = bs[i][14]
+            x7 = bs[i][15]
+            y7 = bs[i][16]
+            x8 = bs[i][17]
+            y8 = bs[i][18]
+
+            x0 = min(0.999, max(0, x0 * sx - dx)) 
+            y0 = min(0.999, max(0, y0 * sy - dy)) 
+            x1 = min(0.999, max(0, x1 * sx - dx)) 
+            y1 = min(0.999, max(0, y1 * sy - dy)) 
+            x2 = min(0.999, max(0, x2 * sx - dx))
+            y2 = min(0.999, max(0, y2 * sy - dy))
+            x3 = min(0.999, max(0, x3 * sx - dx))
+            y3 = min(0.999, max(0, y3 * sy - dy))
+            x4 = min(0.999, max(0, x4 * sx - dx))
+            y4 = min(0.999, max(0, y4 * sy - dy))
+            x5 = min(0.999, max(0, x5 * sx - dx))
+            y5 = min(0.999, max(0, y5 * sy - dy))
+            x6 = min(0.999, max(0, x6 * sx - dx))
+            y6 = min(0.999, max(0, y6 * sy - dy))
+            x7 = min(0.999, max(0, x7 * sx - dx))
+            y7 = min(0.999, max(0, y7 * sy - dy))
+            x8 = min(0.999, max(0, x8 * sx - dx))
+            y8 = min(0.999, max(0, y8 * sy - dy))
+            
+            bs[i][0] = bs[i][0]
+            bs[i][1] = x0
+            bs[i][2] = y0
+            bs[i][3] = x1
+            bs[i][4] = y1
+            bs[i][5] = x2
+            bs[i][6] = y2
+            bs[i][7] = x3
+            bs[i][8] = y3
+            bs[i][9] = x4
+            bs[i][10] = y4
+            bs[i][11] = x5
+            bs[i][12] = y5
+            bs[i][13] = x6
+            bs[i][14] = y6
+            bs[i][15] = x7
+            bs[i][16] = y7
+            bs[i][17] = x8
+            bs[i][18] = y8
+
+            xs = [x1, x2, x3, x4, x5, x6, x7, x8]
+            ys = [y1, y2, y3, y4, y5, y6, y7, y8]
+            min_x = min(xs);
+            max_x = max(xs);
+            min_y = min(ys);
+            max_y = max(ys);
+            bs[i][19] = max_x - min_x;
+            bs[i][20] = max_y - min_y;
+
+            if flip:
+                bs[i][1] =  0.999 - bs[i][1] 
+                bs[i][3] =  0.999 - bs[i][3]
+                bs[i][5] =  0.999 - bs[i][5]
+                bs[i][7] =  0.999 - bs[i][7]
+                bs[i][9] =  0.999 - bs[i][9]
+                bs[i][11] =  0.999 - bs[i][11]
+                bs[i][13] =  0.999 - bs[i][13]
+                bs[i][15] =  0.999 - bs[i][15]
+                bs[i][17] =  0.999 - bs[i][17]
+            
+            label[cc] = bs[i]
+            cc += 1
+            if cc >= 50:
+                break
+
+    label = np.reshape(label, (-1))
+    return label
+
+def change_background(img, mask, bg):
+    ow, oh = img.size
+    bg = bg.resize((ow, oh)).convert('RGB')
+    
+    imcs = list(img.split())
+    bgcs = list(bg.split())
+    maskcs = list(mask.split())
+    fics = list(Image.new(img.mode, img.size).split())
+    
+    for c in range(len(imcs)):
+        negmask = maskcs[c].point(lambda i: 1 - i / 255)
+        posmask = maskcs[c].point(lambda i: i / 255)
+        fics[c] = ImageMath.eval("a * c + b * d", a=imcs[c], b=bgcs[c], c=posmask, d=negmask).convert('L')
+    out = Image.merge(img.mode, tuple(fics))
+
+    return out
+
+def shifted_data_augmentation_with_mask(img, mask, shape, jitter, hue, saturation, exposure):
+    ow, oh = img.size
+    
+    dw =int(ow*jitter)
+    dh =int(oh*jitter)
+
+    pleft  = random.randint(-dw, dw)
+    pright = random.randint(-dw, dw)
+    ptop   = random.randint(-dh, dh)
+    pbot   = random.randint(-dh, dh)
+
+    swidth =  ow - pleft - pright
+    sheight = oh - ptop - pbot
+
+    sx = float(swidth)  / ow
+    sy = float(sheight) / oh
+    
+    flip = random.randint(1,10000)%2
+    
+    cropped = img.crop( (pleft, ptop, pleft + swidth - 1, ptop + sheight - 1))
+    mask_cropped = mask.crop( (pleft, ptop, pleft + swidth - 1, ptop + sheight - 1))
+    
+    cw, ch = cropped.size
+    shift_x = random.randint(-80, 80)
+    shift_y = random.randint(-80, 80)
+    dx = (float(pleft)/ow)/sx - (float(shift_x)/shape[0]) # FIX HERE
+    dy = (float(ptop) /oh)/sy - (float(shift_y)/shape[1]) # FIX HERE
+
+    # dx = (float(pleft)/ow)/sx - (float(shift_x)/ow) 
+    # dy = (float(ptop) /oh)/sy - (float(shift_y)/oh) 
+
+    sized = cropped.resize(shape)
+    mask_sized = mask_cropped.resize(shape)
+
+    sized = ImageChops.offset(sized, shift_x, shift_y)
+    mask_sized = ImageChops.offset(mask_sized, shift_x, shift_y)
+    
+    if flip: 
+        sized = sized.transpose(Image.FLIP_LEFT_RIGHT)
+        mask_sized = mask_sized.transpose(Image.FLIP_LEFT_RIGHT)
+        
+    img = sized
+    mask = mask_sized
+    
+    return img, mask, flip, dx,dy,sx,sy
+
+def data_augmentation_with_mask(img, mask, shape, jitter, hue, saturation, exposure):
+    ow, oh = img.size
+    
+    dw =int(ow*jitter)
+    dh =int(oh*jitter)
+
+    pleft  = random.randint(-dw, dw)
+    pright = random.randint(-dw, dw)
+    ptop   = random.randint(-dh, dh)
+    pbot   = random.randint(-dh, dh)
+
+    swidth =  ow - pleft - pright
+    sheight = oh - ptop - pbot
+
+    sx = float(swidth)  / ow
+    sy = float(sheight) / oh
+    
+    flip = random.randint(1,10000)%2
+    cropped = img.crop( (pleft, ptop, pleft + swidth - 1, ptop + sheight - 1))
+    mask_cropped = mask.crop( (pleft, ptop, pleft + swidth - 1, ptop + sheight - 1))
+
+    dx = (float(pleft)/ow)/sx
+    dy = (float(ptop) /oh)/sy
+
+    sized = cropped.resize(shape)
+    mask_sized = mask_cropped.resize(shape)
+
+    if flip: 
+        sized = sized.transpose(Image.FLIP_LEFT_RIGHT)
+        mask_sized = mask_sized.transpose(Image.FLIP_LEFT_RIGHT)
+    img = sized
+    mask = mask_sized
+    
+    return img, mask, flip, dx,dy,sx,sy 
+
+def superimpose_masked_imgs(masked_img, mask, total_mask):
+    ow, oh = masked_img.size
+    total_mask = total_mask.resize((ow, oh)).convert('RGB')
+    
+    imcs = list(masked_img.split())
+    bgcs = list(total_mask.split())
+    maskcs = list(mask.split())
+    fics = list(Image.new(masked_img.mode, masked_img.size).split())
+    
+    for c in range(len(imcs)):
+        negmask = maskcs[c].point(lambda i: 1 - i / 255)
+        posmask = maskcs[c].point(lambda i: i / 255)
+        fics[c] = ImageMath.eval("a * c + b * d", a=imcs[c], b=bgcs[c], c=posmask, d=negmask).convert('L')
+    out = Image.merge(masked_img.mode, tuple(fics))
+
+    return out
+
+def superimpose_masks(mask, total_mask):
+    # bg: total_mask
+    ow, oh = mask.size
+    total_mask = total_mask.resize((ow, oh)).convert('RGB')
+    
+    total_maskcs = list(total_mask.split())
+    maskcs = list(mask.split())
+    fics = list(Image.new(mask.mode, mask.size).split())
+    
+    for c in range(len(maskcs)):
+        negmask = maskcs[c].point(lambda i: 1 - i / 255)
+        posmask = maskcs[c].point(lambda i: i)
+        fics[c] = ImageMath.eval("c + b * d", b=total_maskcs[c], c=posmask, d=negmask).convert('L')
+    out = Image.merge(mask.mode, tuple(fics))
+
+    return out
+
+def augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation, exposure):
+
+    pixelThreshold = 200
+    
+    random.shuffle(add_objs)
+    labpath = imgpath.replace('images', 'labels').replace('JPEGImages', 'labels').replace('.jpg', '.txt').replace('.png','.txt')
+    maskpath = imgpath.replace('JPEGImages', 'mask').replace('/00', '/').replace('.jpg', '.png')
+
+    # Read the image and the mask
+    img = Image.open(imgpath).convert('RGB')
+    iw, ih = img.size
+    mask = Image.open(maskpath).convert('RGB')
+    img,mask,flip,dx,dy,sx,sy = shifted_data_augmentation_with_mask(img, mask, shape, jitter, hue, saturation, exposure)
+    label = fill_truth_detection(labpath, iw, ih, flip, dx, dy, 1./sx, 1./sy)
+    total_label = np.reshape(label, (-1, 21))  
+
+    # Mask the background
+    masked_img = mask_background(img, mask)
+    mask = mask.resize(shape)
+    masked_img = masked_img.resize(shape)
+    
+    # Initialize the total mask and total masked image
+    total_mask = mask
+    total_masked_img = masked_img
+    count = 1
+    for obj in add_objs:
+        successful = False
+        while not successful:
+
+            objpath = '../LINEMOD/' + obj + '/train.txt'
+            with open(objpath, 'r') as objfile:
+                objlines = objfile.readlines()
+            rand_index = random.randint(0, len(objlines) - 1)
+            obj_rand_img_path = '../' + objlines[rand_index].rstrip()
+            obj_rand_mask_path = obj_rand_img_path.replace('JPEGImages', 'mask').replace('/00', '/').replace('.jpg', '.png')
+            obj_rand_lab_path = obj_rand_img_path.replace('images', 'labels').replace('JPEGImages', 'labels').replace('.jpg', '.txt').replace('.png','.txt')
+
+            obj_rand_img = Image.open(obj_rand_img_path).convert('RGB')
+            obj_rand_mask = Image.open(obj_rand_mask_path).convert('RGB')
+            obj_rand_masked_img = mask_background(obj_rand_img, obj_rand_mask)
+
+            obj_rand_masked_img,obj_rand_mask,flip,dx,dy,sx,sy = data_augmentation_with_mask(obj_rand_masked_img, obj_rand_mask, shape, jitter, hue, saturation, exposure)
+            obj_rand_label = fill_truth_detection(obj_rand_lab_path, iw, ih, flip, dx, dy, 1./sx, 1./sy)
+            
+            # compute intersection (ratio of the object part intersecting with other object parts over the area of the object)
+            xx = np.array(obj_rand_mask)
+            xx = np.where(xx > pixelThreshold, 1, 0)
+            yy = np.array(total_mask)
+            yy = np.where(yy > pixelThreshold, 1, 0)
+            intersection = (xx * yy) 
+            if (np.sum(xx) < 0.01) and (np.sum(xx) > -0.01):
+                successful = False
+                continue
+            intersection_ratio = float(np.sum(intersection)) / float(np.sum(xx))
+            if intersection_ratio < 0.2:
+                successful = True
+                total_mask = superimpose_masks(obj_rand_mask, total_mask) #  total_mask + obj_rand_mask
+                total_masked_img = superimpose_masked_imgs(obj_rand_masked_img, obj_rand_mask, total_masked_img) # total_masked_img + obj_rand_masked_img
+                obj_rand_label = np.reshape(obj_rand_label, (-1, 21))
+                total_label[count, :] = obj_rand_label[0, :] 
+                count = count + 1
+            else:
+                successful = False
+
+    total_masked_img = superimpose_masked_imgs(masked_img, mask, total_masked_img)
+
+    return total_masked_img, np.reshape(total_label, (-1)), total_mask
+
+def load_data_detection(imgpath, shape, jitter, hue, saturation, exposure, bgpath):
+    
+    # Read the background image
+    bg = Image.open(bgpath).convert('RGB')
+    
+    # Understand which object it is and get the neighboring objects
+    dirname = os.path.dirname(os.path.dirname(imgpath)) ## dir of dir of file
+    objname = os.path.basename(dirname)
+    add_objs = get_add_objs(objname)
+    
+    # Add additional objects in the scene, apply data augmentation on the objects
+    total_masked_img, label, total_mask = augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation, exposure)
+    img = change_background(total_masked_img, total_mask, bg)
+    lb = np.reshape(label, (-1, 21))
+    return img,label
+
--- a/py2/multi_obj_pose_estimation/region_loss_multi.py
+++ b/py2/multi_obj_pose_estimation/region_loss_multi.py
@ -0,0 +1,309 @@
+import time
+import torch
+import math
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.autograd import Variable
+from utils import *
+
+def build_targets(pred_corners, target, anchors, num_anchors, num_classes, nH, nW, noobject_scale, object_scale, sil_thresh, seen):
+    nB = target.size(0)
+    nA = num_anchors
+    nC = num_classes
+    anchor_step = len(anchors)/num_anchors
+    conf_mask   = torch.ones(nB, nA, nH, nW) * noobject_scale
+    coord_mask  = torch.zeros(nB, nA, nH, nW)
+    cls_mask    = torch.zeros(nB, nA, nH, nW)
+    tx0         = torch.zeros(nB, nA, nH, nW) 
+    ty0         = torch.zeros(nB, nA, nH, nW) 
+    tx1         = torch.zeros(nB, nA, nH, nW) 
+    ty1         = torch.zeros(nB, nA, nH, nW) 
+    tx2         = torch.zeros(nB, nA, nH, nW) 
+    ty2         = torch.zeros(nB, nA, nH, nW) 
+    tx3         = torch.zeros(nB, nA, nH, nW) 
+    ty3         = torch.zeros(nB, nA, nH, nW) 
+    tx4         = torch.zeros(nB, nA, nH, nW) 
+    ty4         = torch.zeros(nB, nA, nH, nW) 
+    tx5         = torch.zeros(nB, nA, nH, nW) 
+    ty5         = torch.zeros(nB, nA, nH, nW) 
+    tx6         = torch.zeros(nB, nA, nH, nW) 
+    ty6         = torch.zeros(nB, nA, nH, nW) 
+    tx7         = torch.zeros(nB, nA, nH, nW) 
+    ty7         = torch.zeros(nB, nA, nH, nW) 
+    tx8         = torch.zeros(nB, nA, nH, nW) 
+    ty8         = torch.zeros(nB, nA, nH, nW) 
+    tconf       = torch.zeros(nB, nA, nH, nW)
+    tcls        = torch.zeros(nB, nA, nH, nW) 
+
+    nAnchors = nA*nH*nW
+    nPixels  = nH*nW
+    for b in xrange(nB):
+        cur_pred_corners = pred_corners[b*nAnchors:(b+1)*nAnchors].t()
+        cur_confs = torch.zeros(nAnchors)
+        for t in xrange(50):
+            if target[b][t*21+1] == 0:
+                break
+            gx0 = target[b][t*21+1]*nW
+            gy0 = target[b][t*21+2]*nH
+            gx1 = target[b][t*21+3]*nW
+            gy1 = target[b][t*21+4]*nH
+            gx2 = target[b][t*21+5]*nW
+            gy2 = target[b][t*21+6]*nH
+            gx3 = target[b][t*21+7]*nW
+            gy3 = target[b][t*21+8]*nH
+            gx4 = target[b][t*21+9]*nW
+            gy4 = target[b][t*21+10]*nH
+            gx5 = target[b][t*21+11]*nW
+            gy5 = target[b][t*21+12]*nH
+            gx6 = target[b][t*21+13]*nW
+            gy6 = target[b][t*21+14]*nH
+            gx7 = target[b][t*21+15]*nW
+            gy7 = target[b][t*21+16]*nH
+            gx8 = target[b][t*21+17]*nW
+            gy8 = target[b][t*21+18]*nH
+
+            cur_gt_corners = torch.FloatTensor([gx0/nW,gy0/nH,gx1/nW,gy1/nH,gx2/nW,gy2/nH,gx3/nW,gy3/nH,gx4/nW,gy4/nH,gx5/nW,gy5/nH,gx6/nW,gy6/nH,gx7/nW,gy7/nH,gx8/nW,gy8/nH]).repeat(nAnchors,1).t() # 16 x nAnchors
+            cur_confs  = torch.max(cur_confs, corner_confidences9(cur_pred_corners, cur_gt_corners)) # some irrelevant areas are filtered, in the same grid multiple anchor boxes might exceed the threshold
+        conf_mask[b][cur_confs>sil_thresh] = 0
+    if seen < -1:#6400:
+       tx0.fill_(0.5)
+       ty0.fill_(0.5)
+       tx1.fill_(0.5)
+       ty1.fill_(0.5)
+       tx2.fill_(0.5)
+       ty2.fill_(0.5)
+       tx3.fill_(0.5)
+       ty3.fill_(0.5)
+       tx4.fill_(0.5)
+       ty4.fill_(0.5)
+       tx5.fill_(0.5)
+       ty5.fill_(0.5)
+       tx6.fill_(0.5)
+       ty6.fill_(0.5)
+       tx7.fill_(0.5)
+       ty7.fill_(0.5)
+       tx8.fill_(0.5)
+       ty8.fill_(0.5)
+       coord_mask.fill_(1)
+
+    nGT = 0
+    nCorrect = 0
+    for b in xrange(nB):
+        for t in xrange(50):
+            if target[b][t*21+1] == 0:
+                break
+            nGT = nGT + 1
+            best_iou = 0.0
+            best_n = -1
+            min_dist = 10000
+            gx0 = target[b][t*21+1] * nW
+            gy0 = target[b][t*21+2] * nH
+            gi0 = int(gx0)
+            gj0 = int(gy0)
+            gx1 = target[b][t*21+3] * nW
+            gy1 = target[b][t*21+4] * nH
+            gx2 = target[b][t*21+5] * nW
+            gy2 = target[b][t*21+6] * nH
+            gx3 = target[b][t*21+7] * nW
+            gy3 = target[b][t*21+8] * nH
+            gx4 = target[b][t*21+9] * nW
+            gy4 = target[b][t*21+10] * nH
+            gx5 = target[b][t*21+11] * nW
+            gy5 = target[b][t*21+12] * nH
+            gx6 = target[b][t*21+13] * nW
+            gy6 = target[b][t*21+14] * nH
+            gx7 = target[b][t*21+15] * nW
+            gy7 = target[b][t*21+16] * nH
+            gx8 = target[b][t*21+17] * nW
+            gy8 = target[b][t*21+18] * nH
+
+            gw  = target[b][t*21+19]*nW
+            gh  = target[b][t*21+20]*nH
+            gt_box = [0, 0, gw, gh]
+            for n in xrange(nA):
+                aw = anchors[anchor_step*n]
+                ah = anchors[anchor_step*n+1]
+                anchor_box = [0, 0, aw, ah]
+                iou  = bbox_iou(anchor_box, gt_box, x1y1x2y2=False)
+                if iou > best_iou:
+                    best_iou = iou
+                    best_n = n
+
+            gt_box = [gx0/nW,gy0/nH,gx1/nW,gy1/nH,gx2/nW,gy2/nH,gx3/nW,gy3/nH,gx4/nW,gy4/nH,gx5/nW,gy5/nH,gx6/nW,gy6/nH,gx7/nW,gy7/nH,gx8/nW,gy8/nH]
+            pred_box = pred_corners[b*nAnchors+best_n*nPixels+gj0*nW+gi0]
+            conf = corner_confidence9(gt_box, pred_box) 
+            coord_mask[b][best_n][gj0][gi0] = 1
+            cls_mask[b][best_n][gj0][gi0]   = 1
+            conf_mask[b][best_n][gj0][gi0]  = object_scale
+            tx0[b][best_n][gj0][gi0]        = target[b][t*21+1] * nW - gi0
+            ty0[b][best_n][gj0][gi0]        = target[b][t*21+2] * nH - gj0
+            tx1[b][best_n][gj0][gi0]        = target[b][t*21+3] * nW - gi0
+            ty1[b][best_n][gj0][gi0]        = target[b][t*21+4] * nH - gj0
+            tx2[b][best_n][gj0][gi0]        = target[b][t*21+5] * nW - gi0
+            ty2[b][best_n][gj0][gi0]        = target[b][t*21+6] * nH - gj0
+            tx3[b][best_n][gj0][gi0]        = target[b][t*21+7] * nW - gi0
+            ty3[b][best_n][gj0][gi0]        = target[b][t*21+8] * nH - gj0
+            tx4[b][best_n][gj0][gi0]        = target[b][t*21+9] * nW - gi0
+            ty4[b][best_n][gj0][gi0]        = target[b][t*21+10] * nH - gj0
+            tx5[b][best_n][gj0][gi0]        = target[b][t*21+11] * nW - gi0
+            ty5[b][best_n][gj0][gi0]        = target[b][t*21+12] * nH - gj0
+            tx6[b][best_n][gj0][gi0]        = target[b][t*21+13] * nW - gi0
+            ty6[b][best_n][gj0][gi0]        = target[b][t*21+14] * nH - gj0
+            tx7[b][best_n][gj0][gi0]        = target[b][t*21+15] * nW - gi0
+            ty7[b][best_n][gj0][gi0]        = target[b][t*21+16] * nH - gj0
+            tx8[b][best_n][gj0][gi0]        = target[b][t*21+17] * nW - gi0
+            ty8[b][best_n][gj0][gi0]        = target[b][t*21+18] * nH - gj0
+            tconf[b][best_n][gj0][gi0]      = conf
+            tcls[b][best_n][gj0][gi0]       = target[b][t*21]
+
+            if conf > 0.5:
+                nCorrect = nCorrect + 1
+
+    return nGT, nCorrect, coord_mask, conf_mask, cls_mask, tx0, tx1, tx2, tx3, tx4, tx5, tx6, tx7, tx8, ty0, ty1, ty2, ty3, ty4, ty5, ty6, ty7, ty8, tconf, tcls
+           
+class RegionLoss(nn.Module):
+    def __init__(self, num_classes=0, anchors=[], num_anchors=5):
+        super(RegionLoss, self).__init__()
+        self.num_classes = num_classes
+        self.anchors = anchors
+        self.num_anchors = num_anchors
+        self.anchor_step = len(anchors)/num_anchors
+        self.coord_scale = 1
+        self.noobject_scale = 1
+        self.object_scale = 5
+        self.class_scale = 1
+        self.thresh = 0.6
+        self.seen = 0
+
+    def forward(self, output, target):
+        # Parameters
+        t0 = time.time()
+        nB = output.data.size(0)
+        nA = self.num_anchors
+        nC = self.num_classes
+        nH = output.data.size(2)
+        nW = output.data.size(3)
+
+        # Activation
+        output = output.view(nB, nA, (19+nC), nH, nW)
+        x0     = F.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([0]))).view(nB, nA, nH, nW))
+        y0     = F.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([1]))).view(nB, nA, nH, nW))
+        x1     = output.index_select(2, Variable(torch.cuda.LongTensor([2]))).view(nB, nA, nH, nW)
+        y1     = output.index_select(2, Variable(torch.cuda.LongTensor([3]))).view(nB, nA, nH, nW)
+        x2     = output.index_select(2, Variable(torch.cuda.LongTensor([4]))).view(nB, nA, nH, nW)
+        y2     = output.index_select(2, Variable(torch.cuda.LongTensor([5]))).view(nB, nA, nH, nW)
+        x3     = output.index_select(2, Variable(torch.cuda.LongTensor([6]))).view(nB, nA, nH, nW)
+        y3     = output.index_select(2, Variable(torch.cuda.LongTensor([7]))).view(nB, nA, nH, nW)
+        x4     = output.index_select(2, Variable(torch.cuda.LongTensor([8]))).view(nB, nA, nH, nW)
+        y4     = output.index_select(2, Variable(torch.cuda.LongTensor([9]))).view(nB, nA, nH, nW)
+        x5     = output.index_select(2, Variable(torch.cuda.LongTensor([10]))).view(nB, nA, nH, nW)
+        y5     = output.index_select(2, Variable(torch.cuda.LongTensor([11]))).view(nB, nA, nH, nW)
+        x6     = output.index_select(2, Variable(torch.cuda.LongTensor([12]))).view(nB, nA, nH, nW)
+        y6     = output.index_select(2, Variable(torch.cuda.LongTensor([13]))).view(nB, nA, nH, nW)
+        x7     = output.index_select(2, Variable(torch.cuda.LongTensor([14]))).view(nB, nA, nH, nW)
+        y7     = output.index_select(2, Variable(torch.cuda.LongTensor([15]))).view(nB, nA, nH, nW)
+        x8     = output.index_select(2, Variable(torch.cuda.LongTensor([16]))).view(nB, nA, nH, nW)
+        y8     = output.index_select(2, Variable(torch.cuda.LongTensor([17]))).view(nB, nA, nH, nW)
+        conf   = F.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([18]))).view(nB, nA, nH, nW))
+        cls    = output.index_select(2, Variable(torch.linspace(19,19+nC-1,nC).long().cuda()))
+        cls    = cls.view(nB*nA, nC, nH*nW).transpose(1,2).contiguous().view(nB*nA*nH*nW, nC)
+        t1     = time.time()
+
+        # Create pred boxes
+        pred_corners = torch.cuda.FloatTensor(18, nB*nA*nH*nW)
+        grid_x = torch.linspace(0, nW-1, nW).repeat(nH,1).repeat(nB*nA, 1, 1).view(nB*nA*nH*nW).cuda()
+        grid_y = torch.linspace(0, nH-1, nH).repeat(nW,1).t().repeat(nB*nA, 1, 1).view(nB*nA*nH*nW).cuda()
+        pred_corners[0]  = (x0.data + grid_x) / nW
+        pred_corners[1]  = (y0.data + grid_y) / nH
+        pred_corners[2]  = (x1.data + grid_x) / nW
+        pred_corners[3]  = (y1.data + grid_y) / nH
+        pred_corners[4]  = (x2.data + grid_x) / nW
+        pred_corners[5]  = (y2.data + grid_y) / nH
+        pred_corners[6]  = (x3.data + grid_x) / nW
+        pred_corners[7]  = (y3.data + grid_y) / nH
+        pred_corners[8]  = (x4.data + grid_x) / nW
+        pred_corners[9]  = (y4.data + grid_y) / nH
+        pred_corners[10] = (x5.data + grid_x) / nW
+        pred_corners[11] = (y5.data + grid_y) / nH
+        pred_corners[12] = (x6.data + grid_x) / nW
+        pred_corners[13] = (y6.data + grid_y) / nH
+        pred_corners[14] = (x7.data + grid_x) / nW
+        pred_corners[15] = (y7.data + grid_y) / nH
+        pred_corners[16] = (x8.data + grid_x) / nW
+        pred_corners[17] = (y8.data + grid_y) / nH
+        gpu_matrix = pred_corners.transpose(0,1).contiguous().view(-1,18)
+        pred_corners = convert2cpu(gpu_matrix)
+        t2 = time.time()
+
+        # Build targets
+        nGT, nCorrect, coord_mask, conf_mask, cls_mask, tx0, tx1, tx2, tx3, tx4, tx5, tx6, tx7, tx8, ty0, ty1, ty2, ty3, ty4, ty5, ty6, ty7, ty8, tconf, tcls = \
+                       build_targets(pred_corners, target.data, self.anchors, nA, nC, nH, nW, self.noobject_scale, self.object_scale, self.thresh, self.seen)
+        cls_mask   = (cls_mask == 1)
+        nProposals = int((conf > 0.25).sum().data[0])
+        tx0        = Variable(tx0.cuda())
+        ty0        = Variable(ty0.cuda())
+        tx1        = Variable(tx1.cuda())
+        ty1        = Variable(ty1.cuda())
+        tx2        = Variable(tx2.cuda())
+        ty2        = Variable(ty2.cuda())
+        tx3        = Variable(tx3.cuda())
+        ty3        = Variable(ty3.cuda())
+        tx4        = Variable(tx4.cuda())
+        ty4        = Variable(ty4.cuda())
+        tx5        = Variable(tx5.cuda())
+        ty5        = Variable(ty5.cuda())
+        tx6        = Variable(tx6.cuda())
+        ty6        = Variable(ty6.cuda())
+        tx7        = Variable(tx7.cuda())
+        ty7        = Variable(ty7.cuda())
+        tx8        = Variable(tx8.cuda())
+        ty8        = Variable(ty8.cuda())
+        tconf      = Variable(tconf.cuda())
+        tcls       = Variable(tcls.view(-1)[cls_mask].long().cuda())
+        coord_mask = Variable(coord_mask.cuda())
+        conf_mask  = Variable(conf_mask.cuda().sqrt())
+        cls_mask   = Variable(cls_mask.view(-1, 1).repeat(1,nC).cuda())
+        cls        = cls[cls_mask].view(-1, nC)  
+        t3 = time.time()
+
+        # Create loss
+        loss_x0    = self.coord_scale * nn.MSELoss(size_average=False)(x0*coord_mask, tx0*coord_mask)/2.0
+        loss_y0    = self.coord_scale * nn.MSELoss(size_average=False)(y0*coord_mask, ty0*coord_mask)/2.0
+        loss_x1    = self.coord_scale * nn.MSELoss(size_average=False)(x1*coord_mask, tx1*coord_mask)/2.0
+        loss_y1    = self.coord_scale * nn.MSELoss(size_average=False)(y1*coord_mask, ty1*coord_mask)/2.0
+        loss_x2    = self.coord_scale * nn.MSELoss(size_average=False)(x2*coord_mask, tx2*coord_mask)/2.0
+        loss_y2    = self.coord_scale * nn.MSELoss(size_average=False)(y2*coord_mask, ty2*coord_mask)/2.0
+        loss_x3    = self.coord_scale * nn.MSELoss(size_average=False)(x3*coord_mask, tx3*coord_mask)/2.0
+        loss_y3    = self.coord_scale * nn.MSELoss(size_average=False)(y3*coord_mask, ty3*coord_mask)/2.0
+        loss_x4    = self.coord_scale * nn.MSELoss(size_average=False)(x4*coord_mask, tx4*coord_mask)/2.0
+        loss_y4    = self.coord_scale * nn.MSELoss(size_average=False)(y4*coord_mask, ty4*coord_mask)/2.0
+        loss_x5    = self.coord_scale * nn.MSELoss(size_average=False)(x5*coord_mask, tx5*coord_mask)/2.0
+        loss_y5    = self.coord_scale * nn.MSELoss(size_average=False)(y5*coord_mask, ty5*coord_mask)/2.0
+        loss_x6    = self.coord_scale * nn.MSELoss(size_average=False)(x6*coord_mask, tx6*coord_mask)/2.0
+        loss_y6    = self.coord_scale * nn.MSELoss(size_average=False)(y6*coord_mask, ty6*coord_mask)/2.0
+        loss_x7    = self.coord_scale * nn.MSELoss(size_average=False)(x7*coord_mask, tx7*coord_mask)/2.0
+        loss_y7    = self.coord_scale * nn.MSELoss(size_average=False)(y7*coord_mask, ty7*coord_mask)/2.0
+        loss_x8    = self.coord_scale * nn.MSELoss(size_average=False)(x8*coord_mask, tx8*coord_mask)/2.0
+        loss_y8    = self.coord_scale * nn.MSELoss(size_average=False)(y8*coord_mask, ty8*coord_mask)/2.0
+        loss_conf  = nn.MSELoss(size_average=False)(conf*conf_mask, tconf*conf_mask)/2.0
+        loss_x     = loss_x0 + loss_x1 + loss_x2 + loss_x3 + loss_x4 + loss_x5 + loss_x6 + loss_x7 + loss_x8 
+        loss_y     = loss_y0 + loss_y1 + loss_y2 + loss_y3 + loss_y4 + loss_y5 + loss_y6 + loss_y7 + loss_y8 
+        
+        loss_cls   = self.class_scale * nn.CrossEntropyLoss(size_average=False)(cls, tcls)
+        loss   = loss_x + loss_y + loss_conf + loss_cls
+        print('%d: nGT %d, recall %d, proposals %d, loss: x0: %f x %f, y0: %f y %f, conf %f, cls %f, total %f' % (self.seen, nGT, nCorrect, nProposals, loss_x0.data[0], loss_x.data[0], loss_y0.data[0], loss_y.data[0], loss_conf.data[0], loss_cls.data[0], loss.data[0]))
+        #else:
+        #    loss   = loss_x + loss_y + loss_conf
+        #    print('%d: nGT %d, recall %d, proposals %d, loss: x %f, y %f, conf %f, total %f' % (self.seen, nGT, nCorrect, nProposals, loss_x.data[0], loss_y.data[0], loss_conf.data[0], loss.data[0]))
+
+        t4 = time.time()
+
+        if False:
+            print('-----------------------------------')
+            print('          activation : %f' % (t1 - t0))
+            print(' create pred_corners : %f' % (t2 - t1))
+            print('       build targets : %f' % (t3 - t2))
+            print('         create loss : %f' % (t4 - t3))
+            print('               total : %f' % (t4 - t0))
+
+        return loss
--- a/py2/multi_obj_pose_estimation/train_multi.py
+++ b/py2/multi_obj_pose_estimation/train_multi.py
@ -0,0 +1,424 @@
+from __future__ import print_function
+import os
+os.sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+import sys
+import time
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+import torch.backends.cudnn as cudnn
+import numpy as np
+import random
+import math
+import shutil
+from torchvision import datasets, transforms
+from torch.autograd import Variable # Useful info about autograd: http://pytorch.org/docs/master/notes/autograd.html
+
+from darknet_multi import Darknet
+from MeshPly import MeshPly
+from utils import *    
+from cfg import parse_cfg
+import dataset_multi
+from region_loss_multi import RegionLoss
+
+
+# Create new directory
+def makedirs(path):
+    if not os.path.exists( path ):
+        os.makedirs( path )
+
+# Adjust learning rate during training, learning schedule can be changed in network config file
+def adjust_learning_rate(optimizer, batch):
+    lr = learning_rate
+    for i in range(len(steps)):
+        scale = scales[i] if i < len(scales) else 1
+        if batch >= steps[i]:
+            lr = lr * scale
+            if batch == steps[i]:
+                break
+        else:
+            break
+    for param_group in optimizer.param_groups:
+        param_group['lr'] = lr/batch_size
+    return lr
+
+def train(epoch):
+
+    global processed_batches
+    
+    # Initialize timer
+    t0 = time.time()
+
+    # Get the dataloader for training dataset
+    train_loader = torch.utils.data.DataLoader(dataset_multi.listDataset(trainlist, shape=(init_width, init_height),
+                                                            shuffle=True,
+                                                            transform=transforms.Compose([transforms.ToTensor(),]), 
+                                                            train=True, 
+                                                            seen=model.module.seen,
+                                                            batch_size=batch_size,
+                                                            num_workers=num_workers, bg_file_names=bg_file_names),
+                                                batch_size=batch_size, shuffle=False, **kwargs)
+
+    # TRAINING
+    lr = adjust_learning_rate(optimizer, processed_batches)
+    logging('epoch %d, processed %d samples, lr %f' % (epoch, epoch * len(train_loader.dataset), lr))
+    # Start training
+    model.train()
+    t1 = time.time()
+    avg_time = torch.zeros(9)
+    niter = 0
+    # Iterate through batches
+    for batch_idx, (data, target) in enumerate(train_loader):
+        t2 = time.time()
+        # adjust learning rate
+        adjust_learning_rate(optimizer, processed_batches)
+        processed_batches = processed_batches + 1
+        # Pass the data to GPU
+        if use_cuda:
+            data = data.cuda()
+        t3 = time.time()
+        # Wrap tensors in Variable class for automatic differentiation
+        data, target = Variable(data), Variable(target)
+        t4 = time.time()
+        # Zero the gradients before running the backward pass
+        optimizer.zero_grad()
+        t5 = time.time()
+        # Forward pass
+        output = model(data)
+        t6 = time.time()
+        region_loss.seen = region_loss.seen + data.data.size(0)
+        # Compute loss, grow an array of losses for saving later on
+        loss = region_loss(output, target)
+        training_iters.append(epoch * math.ceil(len(train_loader.dataset) / float(batch_size) ) + niter)
+        training_losses.append(convert2cpu(loss.data))
+        niter += 1
+        t7 = time.time()
+        # Backprop: compute gradient of the loss with respect to model parameters
+        loss.backward()
+        t8 = time.time()
+        # Update weights
+        optimizer.step()
+        t9 = time.time()
+        # Print time statistics
+        if False and batch_idx > 1:
+            avg_time[0] = avg_time[0] + (t2-t1)
+            avg_time[1] = avg_time[1] + (t3-t2)
+            avg_time[2] = avg_time[2] + (t4-t3)
+            avg_time[3] = avg_time[3] + (t5-t4)
+            avg_time[4] = avg_time[4] + (t6-t5)
+            avg_time[5] = avg_time[5] + (t7-t6)
+            avg_time[6] = avg_time[6] + (t8-t7)
+            avg_time[7] = avg_time[7] + (t9-t8)
+            avg_time[8] = avg_time[8] + (t9-t1)
+            print('-------------------------------')
+            print('       load data : %f' % (avg_time[0]/(batch_idx)))
+            print('     cpu to cuda : %f' % (avg_time[1]/(batch_idx)))
+            print('cuda to variable : %f' % (avg_time[2]/(batch_idx)))
+            print('       zero_grad : %f' % (avg_time[3]/(batch_idx)))
+            print(' forward feature : %f' % (avg_time[4]/(batch_idx)))
+            print('    forward loss : %f' % (avg_time[5]/(batch_idx)))
+            print('        backward : %f' % (avg_time[6]/(batch_idx)))
+            print('            step : %f' % (avg_time[7]/(batch_idx)))
+            print('           total : %f' % (avg_time[8]/(batch_idx)))
+        t1 = time.time()
+    t1 = time.time()
+    return epoch * math.ceil(len(train_loader.dataset) / float(batch_size) ) + niter - 1 
+
+def eval(niter, datacfg, cfgfile):
+    def truths_length(truths):
+        for i in range(50):
+            if truths[i][1] == 0:
+                return i
+            
+    # Parse configuration files
+    options       = read_data_cfg(datacfg)
+    valid_images  = options['valid']
+    meshname      = options['mesh']
+    backupdir     = options['backup']
+    name          = options['name']
+    prefix        = 'results'
+    # Read object model information, get 3D bounding box corners
+    mesh          = MeshPly(meshname)
+    vertices      = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()
+    corners3D     = get_3D_corners(vertices)
+    # Read intrinsic camera parameters
+    internal_calibration = get_camera_intrinsic()        
+    
+    # Get validation file names
+    with open(valid_images) as fp:
+        tmp_files = fp.readlines()
+        valid_files = [item.rstrip() for item in tmp_files]
+        
+    # Specify model, load pretrained weights, pass to GPU and set the module in evaluation mode
+    model.eval()
+    
+    # Get the parser for the test dataset
+    valid_dataset = dataset_multi.listDataset(valid_images, shape=(model.module.width, model.module.height),
+                       shuffle=False,
+                       objclass=name,
+                       transform=transforms.Compose([
+                           transforms.ToTensor(),
+                       ]))
+    valid_batchsize = 1
+
+    # Specify the number of workers for multiple processing, get the dataloader for the test dataset
+    kwargs = {'num_workers': 4, 'pin_memory': True}
+    test_loader = torch.utils.data.DataLoader(
+        valid_dataset, batch_size=valid_batchsize, shuffle=False, **kwargs) 
+
+    # Parameters
+    num_classes          = model.module.num_classes
+    anchors              = model.module.anchors
+    num_anchors          = model.module.num_anchors
+    testing_error_pixel  = 0.0
+    testing_samples      = 0.0
+    errs_2d              = []
+
+    logging("   Number of test samples: %d" % len(test_loader.dataset))
+    # Iterate through test examples 
+    for batch_idx, (data, target) in enumerate(test_loader):
+        t1 = time.time()
+        
+        # Pass the data to GPU
+        if use_cuda:
+            data = data.cuda()
+            target = target.cuda()
+        
+        # Wrap tensors in Variable class, set volatile=True for inference mode and to use minimal memory during inference
+        data = Variable(data, volatile=True)
+        t2 = time.time()
+        
+        # Formward pass
+        output = model(data).data  
+        t3 = time.time()
+        
+        # Using confidence threshold, eliminate low-confidence predictions
+        trgt = target[0].view(-1, 21)
+        all_boxes = get_corresponding_region_boxes(output, conf_thresh, num_classes, anchors, num_anchors, int(trgt[0][0]), only_objectness=0)    
+        t4 = time.time()
+
+        # Iterate through all batch elements
+        for i in range(output.size(0)):
+
+            # For each image, get all the predictions
+            boxes   = all_boxes[i]
+
+            # For each image, get all the targets (for multiple object pose estimation, there might be more than 1 target per image)
+            truths  = target[i].view(-1, 21)
+
+            # Get how many objects are present in the scene
+            num_gts = truths_length(truths)
+
+
+            # Iterate through each ground-truth object
+            for k in range(num_gts):
+                box_gt        = [truths[k][1], truths[k][2], truths[k][3], truths[k][4], truths[k][5], truths[k][6], 
+                                truths[k][7], truths[k][8], truths[k][9], truths[k][10], truths[k][11], truths[k][12], 
+                                truths[k][13], truths[k][14], truths[k][15], truths[k][16], truths[k][17], truths[k][18], 1.0, 1.0, truths[k][0]]
+                best_conf_est = -1
+
+                # If the prediction has the highest confidence, choose it as our prediction
+                for j in range(len(boxes)):
+                    if (boxes[j][18] > best_conf_est) and (boxes[j][20] == int(truths[k][0])):
+                        best_conf_est = boxes[j][18]
+                        box_pr        = boxes[j]
+                        bb2d_gt       = get_2d_bb(box_gt[:18], output.size(3))
+                        bb2d_pr       = get_2d_bb(box_pr[:18], output.size(3))
+                        iou           = bbox_iou(bb2d_gt, bb2d_pr)
+                        match         = corner_confidence9(box_gt[:18], torch.FloatTensor(boxes[j][:18]))
+
+                # Denormalize the corner predictions 
+                corners2D_gt = np.array(np.reshape(box_gt[:18], [9, 2]), dtype='float32')
+                corners2D_pr = np.array(np.reshape(box_pr[:18], [9, 2]), dtype='float32')
+                corners2D_gt[:, 0] = corners2D_gt[:, 0] * im_width
+                corners2D_gt[:, 1] = corners2D_gt[:, 1] * im_height               
+                corners2D_pr[:, 0] = corners2D_pr[:, 0] * im_width
+                corners2D_pr[:, 1] = corners2D_pr[:, 1] * im_height
+                corners2D_gt_corrected = fix_corner_order(corners2D_gt) # Fix the order of the corners in OCCLUSION
+
+                # Compute [R|t] by pnp
+                objpoints3D = np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32')
+                K = np.array(internal_calibration, dtype='float32')
+                R_gt, t_gt = pnp(objpoints3D,  corners2D_gt_corrected, K)
+                R_pr, t_pr = pnp(objpoints3D,  corners2D_pr, K)
+                
+                # Compute pixel error
+                Rt_gt        = np.concatenate((R_gt, t_gt), axis=1)
+                Rt_pr        = np.concatenate((R_pr, t_pr), axis=1)
+                proj_2d_gt   = compute_projection(vertices, Rt_gt, internal_calibration) 
+                proj_2d_pred = compute_projection(vertices, Rt_pr, internal_calibration) 
+                proj_corners_gt = np.transpose(compute_projection(corners3D, Rt_gt, internal_calibration)) 
+                proj_corners_pr = np.transpose(compute_projection(corners3D, Rt_pr, internal_calibration)) 
+                norm         = np.linalg.norm(proj_2d_gt - proj_2d_pred, axis=0)
+                pixel_dist   = np.mean(norm)
+                errs_2d.append(pixel_dist)
+
+                # Sum errors
+                testing_error_pixel  += pixel_dist
+                testing_samples      += 1
+
+        t5 = time.time()
+
+    # Compute 2D reprojection score
+    for px_threshold in [5, 10, 15, 20, 25, 30, 35, 40, 45, 50]:
+        acc = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)
+        logging('   Acc using {} px 2D Projection = {:.2f}%'.format(px_threshold, acc))
+
+    if True:
+        logging('-----------------------------------')
+        logging('  tensor to cuda : %f' % (t2 - t1))
+        logging('         predict : %f' % (t3 - t2))
+        logging('get_region_boxes : %f' % (t4 - t3))
+        logging('            eval : %f' % (t5 - t4))
+        logging('           total : %f' % (t5 - t1))
+        logging('-----------------------------------')
+
+    # Register losses and errors for saving later on
+    testing_iters.append(niter)
+    testing_errors_pixel.append(testing_error_pixel/(float(testing_samples)+eps))
+    testing_accuracies.append(acc)
+
+def test(niter):
+    
+    cfgfile = 'cfg/yolo-pose-multi.cfg'
+    datacfg = 'cfg/ape_occlusion.data'
+    logging("Testing ape...")
+    eval(niter, datacfg, cfgfile)
+    datacfg = 'cfg/can_occlusion.data'
+    logging("Testing can...")
+    eval(niter, datacfg, cfgfile)
+    datacfg = 'cfg/cat_occlusion.data'
+    logging("Testing cat...")
+    eval(niter, datacfg, cfgfile)
+    datacfg = 'cfg/duck_occlusion.data'
+    logging("Testing duck...")
+    eval(niter, datacfg, cfgfile)
+    datacfg = 'cfg/driller_occlusion.data'
+    logging("Testing driller...")
+    eval(niter, datacfg, cfgfile)
+    datacfg = 'cfg/glue_occlusion.data'
+    logging("Testing glue...")
+    eval(niter, datacfg, cfgfile)
+    # datacfg = 'cfg/holepuncher_occlusion.data'
+    # logging("Testing holepuncher...")
+    # eval(niter, datacfg, cfgfile)
+
+if __name__ == "__main__":
+
+    # Training settings
+    datacfg       = sys.argv[1]
+    cfgfile       = sys.argv[2]
+    weightfile    = sys.argv[3]
+
+    # Parse configuration files
+    data_options   = read_data_cfg(datacfg)
+    net_options    = parse_cfg(cfgfile)[0]
+    trainlist      = data_options['train']
+    nsamples      = file_lines(trainlist)
+    gpus          = data_options['gpus']  # e.g. 0,1,2,3
+    gpus = '0'
+    num_workers   = int(data_options['num_workers'])
+    backupdir     = data_options['backup']
+    if not os.path.exists(backupdir):
+        makedirs(backupdir)
+    batch_size    = int(net_options['batch'])
+    max_batches   = int(net_options['max_batches'])
+    learning_rate = float(net_options['learning_rate'])
+    momentum      = float(net_options['momentum'])
+    decay         = float(net_options['decay'])
+    steps         = [float(step) for step in net_options['steps'].split(',')]
+    scales        = [float(scale) for scale in net_options['scales'].split(',')]
+    bg_file_names = get_all_files('../VOCdevkit/VOC2012/JPEGImages')
+
+    # Train parameters
+    max_epochs    = 700 # max_batches*batch_size/nsamples+1
+    use_cuda      = True
+    seed          = int(time.time())
+    eps           = 1e-5
+    save_interval = 10 # epoches
+    dot_interval  = 70 # batches
+    best_acc       = -1 
+
+    # Test parameters
+    conf_thresh   = 0.05
+    nms_thresh    = 0.4
+    match_thresh  = 0.5
+    iou_thresh    = 0.5
+    im_width      = 640
+    im_height     = 480 
+
+    # Specify which gpus to use
+    torch.manual_seed(seed)
+    if use_cuda:
+        os.environ['CUDA_VISIBLE_DEVICES'] = gpus
+        torch.cuda.manual_seed(seed)
+
+    # Specifiy the model and the loss
+    model       = Darknet(cfgfile)
+    region_loss = model.loss
+
+    # Model settings
+    # model.load_weights(weightfile)
+    model.load_weights_until_last(weightfile) 
+    model.print_network()
+    model.seen        = 0
+    region_loss.iter  = model.iter
+    region_loss.seen  = model.seen
+    processed_batches = model.seen/batch_size
+    init_width        = model.width
+    init_height       = model.height
+    init_epoch        = model.seen/nsamples 
+
+    # Variable to save
+    training_iters          = []
+    training_losses         = []
+    testing_iters           = []
+    testing_errors_pixel    = []
+    testing_accuracies      = []
+
+
+    # Specify the number of workers
+    kwargs = {'num_workers': num_workers, 'pin_memory': True} if use_cuda else {}
+
+
+    # Pass the model to GPU
+    if use_cuda:
+        # model = model.cuda() 
+        model = torch.nn.DataParallel(model, device_ids=[0]).cuda() # Multiple GPU parallelism
+
+    # Get the optimizer
+    params_dict = dict(model.named_parameters())
+    params = []
+    for key, value in params_dict.items():
+        if key.find('.bn') >= 0 or key.find('.bias') >= 0:
+            params += [{'params': [value], 'weight_decay': 0.0}]
+        else:
+            params += [{'params': [value], 'weight_decay': decay*batch_size}]
+    optimizer = optim.SGD(model.parameters(), lr=learning_rate/batch_size, momentum=momentum, dampening=0, weight_decay=decay*batch_size)
+    # optimizer = optim.Adam(model.parameters(), lr=0.001) # Adam optimization
+
+    evaluate = False
+    if evaluate:
+        logging('evaluating ...')
+        test(0, 0)
+    else:
+        for epoch in range(init_epoch, max_epochs): 
+            # TRAIN
+            niter = train(epoch)
+            # TEST and SAVE
+            if (epoch % 20 == 0) and (epoch is not 0): 
+                test(niter)
+                logging('save training stats to %s/costs.npz' % (backupdir))
+                np.savez(os.path.join(backupdir, "costs.npz"),
+                    training_iters=training_iters,
+                    training_losses=training_losses,
+                    testing_iters=testing_iters,
+                    testing_accuracies=testing_accuracies,
+                    testing_errors_pixel=testing_errors_pixel) 
+                if (np.mean(testing_accuracies[-5:]) > best_acc ):
+                    best_acc = np.mean(testing_accuracies[-5:])
+                    logging('best model so far!')
+                    logging('save weights to %s/model.weights' % (backupdir))
+                    model.module.save_weights('%s/model.weights' % (backupdir))
+        shutil.copy2('%s/model.weights' % (backupdir), '%s/model_backup.weights' % (backupdir))
--- a/py2/multi_obj_pose_estimation/valid_multi.ipynb
+++ b/py2/multi_obj_pose_estimation/valid_multi.ipynb
@ -0,0 +1,343 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "import os\n",
+    "os.sys.path.append('..')\n",
+    "os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"1\"\n",
+    "import torch\n",
+    "from torch.autograd import Variable\n",
+    "from torchvision import datasets, transforms\n",
+    "from scipy.misc import imsave\n",
+    "import scipy.io\n",
+    "import warnings\n",
+    "import sys\n",
+    "warnings.filterwarnings(\"ignore\")\n",
+    "import matplotlib.pyplot as plt\n",
+    "import scipy.misc\n",
+    "\n",
+    "from darknet_multi import Darknet\n",
+    "from utils import *\n",
+    "import dataset_multi\n",
+    "from MeshPly import MeshPly"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2018-05-06 14:09:50 Testing ape...\n",
+      "2018-05-06 14:10:15    Acc using 5 px 2D Projection = 7.01%\n",
+      "2018-05-06 14:10:15    Acc using 10 px 2D Projection = 40.43%\n",
+      "2018-05-06 14:10:15    Acc using 15 px 2D Projection = 59.83%\n",
+      "2018-05-06 14:10:15    Acc using 20 px 2D Projection = 68.55%\n",
+      "2018-05-06 14:10:15    Acc using 25 px 2D Projection = 72.05%\n",
+      "2018-05-06 14:10:15    Acc using 30 px 2D Projection = 73.68%\n",
+      "2018-05-06 14:10:15    Acc using 35 px 2D Projection = 74.53%\n",
+      "2018-05-06 14:10:15    Acc using 40 px 2D Projection = 75.13%\n",
+      "2018-05-06 14:10:15    Acc using 45 px 2D Projection = 75.73%\n",
+      "2018-05-06 14:10:15    Acc using 50 px 2D Projection = 76.50%\n",
+      "2018-05-06 14:10:18 Testing can...\n",
+      "2018-05-06 14:10:47    Acc using 5 px 2D Projection = 11.18%\n",
+      "2018-05-06 14:10:47    Acc using 10 px 2D Projection = 57.83%\n",
+      "2018-05-06 14:10:47    Acc using 15 px 2D Projection = 79.95%\n",
+      "2018-05-06 14:10:47    Acc using 20 px 2D Projection = 85.75%\n",
+      "2018-05-06 14:10:47    Acc using 25 px 2D Projection = 88.73%\n",
+      "2018-05-06 14:10:47    Acc using 30 px 2D Projection = 90.39%\n",
+      "2018-05-06 14:10:47    Acc using 35 px 2D Projection = 91.80%\n",
+      "2018-05-06 14:10:47    Acc using 40 px 2D Projection = 93.21%\n",
+      "2018-05-06 14:10:47    Acc using 45 px 2D Projection = 93.62%\n",
+      "2018-05-06 14:10:47    Acc using 50 px 2D Projection = 93.79%\n",
+      "2018-05-06 14:10:50 Testing cat...\n",
+      "2018-05-06 14:11:16    Acc using 5 px 2D Projection = 3.62%\n",
+      "2018-05-06 14:11:16    Acc using 10 px 2D Projection = 23.25%\n",
+      "2018-05-06 14:11:16    Acc using 15 px 2D Projection = 39.51%\n",
+      "2018-05-06 14:11:16    Acc using 20 px 2D Projection = 49.45%\n",
+      "2018-05-06 14:11:16    Acc using 25 px 2D Projection = 54.76%\n",
+      "2018-05-06 14:11:16    Acc using 30 px 2D Projection = 57.96%\n",
+      "2018-05-06 14:11:16    Acc using 35 px 2D Projection = 59.56%\n",
+      "2018-05-06 14:11:16    Acc using 40 px 2D Projection = 60.99%\n",
+      "2018-05-06 14:11:16    Acc using 45 px 2D Projection = 62.51%\n",
+      "2018-05-06 14:11:16    Acc using 50 px 2D Projection = 63.27%\n",
+      "2018-05-06 14:11:19 Testing duck...\n",
+      "2018-05-06 14:11:42    Acc using 5 px 2D Projection = 5.07%\n",
+      "2018-05-06 14:11:42    Acc using 10 px 2D Projection = 18.20%\n",
+      "2018-05-06 14:11:42    Acc using 15 px 2D Projection = 30.88%\n",
+      "2018-05-06 14:11:42    Acc using 20 px 2D Projection = 55.12%\n",
+      "2018-05-06 14:11:42    Acc using 25 px 2D Projection = 75.15%\n",
+      "2018-05-06 14:11:42    Acc using 30 px 2D Projection = 81.45%\n",
+      "2018-05-06 14:11:42    Acc using 35 px 2D Projection = 83.20%\n",
+      "2018-05-06 14:11:42    Acc using 40 px 2D Projection = 83.64%\n",
+      "2018-05-06 14:11:42    Acc using 45 px 2D Projection = 83.90%\n",
+      "2018-05-06 14:11:42    Acc using 50 px 2D Projection = 84.16%\n",
+      "2018-05-06 14:11:45 Testing driller...\n",
+      "2018-05-06 14:12:10    Acc using 5 px 2D Projection = 1.40%\n",
+      "2018-05-06 14:12:10    Acc using 10 px 2D Projection = 17.38%\n",
+      "2018-05-06 14:12:10    Acc using 15 px 2D Projection = 39.87%\n",
+      "2018-05-06 14:12:10    Acc using 20 px 2D Projection = 62.93%\n",
+      "2018-05-06 14:12:10    Acc using 25 px 2D Projection = 80.64%\n",
+      "2018-05-06 14:12:10    Acc using 30 px 2D Projection = 89.87%\n",
+      "2018-05-06 14:12:10    Acc using 35 px 2D Projection = 94.89%\n",
+      "2018-05-06 14:12:10    Acc using 40 px 2D Projection = 95.88%\n",
+      "2018-05-06 14:12:10    Acc using 45 px 2D Projection = 96.54%\n",
+      "2018-05-06 14:12:10    Acc using 50 px 2D Projection = 96.87%\n",
+      "2018-05-06 14:12:13 Testing glue...\n",
+      "2018-05-06 14:12:31    Acc using 5 px 2D Projection = 6.53%\n",
+      "2018-05-06 14:12:31    Acc using 10 px 2D Projection = 26.91%\n",
+      "2018-05-06 14:12:31    Acc using 15 px 2D Projection = 39.65%\n",
+      "2018-05-06 14:12:31    Acc using 20 px 2D Projection = 46.18%\n",
+      "2018-05-06 14:12:31    Acc using 25 px 2D Projection = 49.50%\n",
+      "2018-05-06 14:12:31    Acc using 30 px 2D Projection = 51.83%\n",
+      "2018-05-06 14:12:31    Acc using 35 px 2D Projection = 53.05%\n",
+      "2018-05-06 14:12:31    Acc using 40 px 2D Projection = 53.16%\n",
+      "2018-05-06 14:12:31    Acc using 45 px 2D Projection = 53.93%\n",
+      "2018-05-06 14:12:31    Acc using 50 px 2D Projection = 54.71%\n",
+      "2018-05-06 14:12:45 Testing holepuncher...\n",
+      "2018-05-06 14:19:31    Acc using 5 px 2D Projection = 8.26%\n",
+      "2018-05-06 14:19:31    Acc using 10 px 2D Projection = 39.50%\n",
+      "2018-05-06 14:19:31    Acc using 15 px 2D Projection = 53.31%\n",
+      "2018-05-06 14:19:31    Acc using 20 px 2D Projection = 62.56%\n",
+      "2018-05-06 14:19:31    Acc using 25 px 2D Projection = 68.02%\n",
+      "2018-05-06 14:19:31    Acc using 30 px 2D Projection = 74.71%\n",
+      "2018-05-06 14:19:31    Acc using 35 px 2D Projection = 80.74%\n",
+      "2018-05-06 14:19:31    Acc using 40 px 2D Projection = 85.62%\n",
+      "2018-05-06 14:19:31    Acc using 45 px 2D Projection = 89.59%\n",
+      "2018-05-06 14:19:31    Acc using 50 px 2D Projection = 91.49%\n"
+     ]
+    }
+   ],
+   "source": [
+    "def valid(datacfg, cfgfile, weightfile, conf_th):\n",
+    "    def truths_length(truths):\n",
+    "        for i in range(50):\n",
+    "            if truths[i][1] == 0:\n",
+    "                return i\n",
+    "\n",
+    "    # Parse configuration files\n",
+    "    options       = read_data_cfg(datacfg)\n",
+    "    valid_images  = options['valid']\n",
+    "    meshname      = options['mesh']\n",
+    "    backupdir     = options['backup']\n",
+    "    name          = options['name']\n",
+    "    prefix        = 'results'\n",
+    "    # Read object model information, get 3D bounding box corners\n",
+    "    mesh          = MeshPly(meshname)\n",
+    "    vertices      = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()\n",
+    "    corners3D     = get_3D_corners(vertices)\n",
+    "    # Read intrinsic camera parameters\n",
+    "    internal_calibration = get_camera_intrinsic()\n",
+    "\n",
+    "    # Get validation file names\n",
+    "    with open(valid_images) as fp:\n",
+    "        tmp_files = fp.readlines()\n",
+    "        valid_files = [item.rstrip() for item in tmp_files]\n",
+    "    \n",
+    "    # Specicy model, load pretrained weights, pass to GPU and set the module in evaluation mode\n",
+    "    model = Darknet(cfgfile)\n",
+    "    model.load_weights(weightfile)\n",
+    "    model.cuda()\n",
+    "    model.eval()\n",
+    "\n",
+    "    # Get the parser for the test dataset\n",
+    "    valid_dataset = dataset_multi.listDataset(valid_images, shape=(model.width, model.height),\n",
+    "                       shuffle=False,\n",
+    "                       objclass=name,\n",
+    "                       transform=transforms.Compose([\n",
+    "                           transforms.ToTensor(),\n",
+    "                       ]))\n",
+    "    valid_batchsize = 1\n",
+    "\n",
+    "    # Specify the number of workers for multiple processing, get the dataloader for the test dataset\n",
+    "    kwargs = {'num_workers': 4, 'pin_memory': True}\n",
+    "    test_loader = torch.utils.data.DataLoader(\n",
+    "        valid_dataset, batch_size=valid_batchsize, shuffle=False, **kwargs) \n",
+    "\n",
+    "    # Parameters\n",
+    "    visualize       = False\n",
+    "    use_cuda        = True\n",
+    "    num_classes     = 13\n",
+    "    anchors         = [1.4820, 2.2412, 2.0501, 3.1265, 2.3946, 4.6891, 3.1018, 3.9910, 3.4879, 5.8851]\n",
+    "    num_anchors     = 5\n",
+    "    eps             = 1e-5\n",
+    "    conf_thresh     = conf_th\n",
+    "    iou_thresh      = 0.5\n",
+    "\n",
+    "    # Parameters to save\n",
+    "    errs_2d = []\n",
+    "    edges = [[1, 2], [1, 3], [1, 5], [2, 4], [2, 6], [3, 4], [3, 7], [4, 8], [5, 6], [5, 7], [6, 8], [7, 8]]\n",
+    "    edges_corners = [[0, 1], [0, 2], [0, 4], [1, 3], [1, 5], [2, 3], [2, 6], [3, 7], [4, 5], [4, 6], [5, 7], [6, 7]]\n",
+    "\n",
+    "    # Iterate through test batches (Batch size for test data is 1)\n",
+    "    count = 0\n",
+    "    logging('Testing {}...'.format(name))\n",
+    "    for batch_idx, (data, target) in enumerate(test_loader):\n",
+    "        \n",
+    "        # Images\n",
+    "        img = data[0, :, :, :]\n",
+    "        img = img.numpy().squeeze()\n",
+    "        img = np.transpose(img, (1, 2, 0))\n",
+    "        \n",
+    "        t1 = time.time()\n",
+    "        # Pass data to GPU\n",
+    "        if use_cuda:\n",
+    "            data = data.cuda()\n",
+    "            target = target.cuda()\n",
+    "        \n",
+    "        # Wrap tensors in Variable class, set volatile=True for inference mode and to use minimal memory during inference\n",
+    "        data = Variable(data, volatile=True)\n",
+    "        t2 = time.time()\n",
+    "        \n",
+    "        # Forward pass\n",
+    "        output = model(data).data  \n",
+    "        t3 = time.time()\n",
+    "        \n",
+    "        # Using confidence threshold, eliminate low-confidence predictions\n",
+    "        trgt = target[0].view(-1, 21)\n",
+    "        all_boxes = get_corresponding_region_boxes(output, conf_thresh, num_classes, anchors, num_anchors, int(trgt[0][0]), only_objectness=0)        \n",
+    "        t4 = time.time()\n",
+    "        \n",
+    "        # Iterate through all images in the batch\n",
+    "        for i in range(output.size(0)):\n",
+    "            \n",
+    "            # For each image, get all the predictions\n",
+    "            boxes   = all_boxes[i]\n",
+    "            \n",
+    "            # For each image, get all the targets (for multiple object pose estimation, there might be more than 1 target per image)\n",
+    "            truths  = target[i].view(-1, 21)\n",
+    "            \n",
+    "            # Get how many object are present in the scene\n",
+    "            num_gts = truths_length(truths)\n",
+    "\n",
+    "            # Iterate through each ground-truth object\n",
+    "            for k in range(num_gts):\n",
+    "                box_gt        = [truths[k][1], truths[k][2], truths[k][3], truths[k][4], truths[k][5], truths[k][6], \n",
+    "                                truths[k][7], truths[k][8], truths[k][9], truths[k][10], truths[k][11], truths[k][12], \n",
+    "                                truths[k][13], truths[k][14], truths[k][15], truths[k][16], truths[k][17], truths[k][18], 1.0, 1.0, truths[k][0]]\n",
+    "                best_conf_est = -1\n",
+    "                \n",
+    "\n",
+    "                # If the prediction has the highest confidence, choose it as our prediction\n",
+    "                for j in range(len(boxes)):\n",
+    "                    if (boxes[j][18] > best_conf_est) and (boxes[j][20] == int(truths[k][0])):\n",
+    "                        best_conf_est = boxes[j][18]\n",
+    "                        box_pr        = boxes[j]\n",
+    "                        bb2d_gt       = get_2d_bb(box_gt[:18], output.size(3))\n",
+    "                        bb2d_pr       = get_2d_bb(box_pr[:18], output.size(3))\n",
+    "                        iou           = bbox_iou(bb2d_gt, bb2d_pr)\n",
+    "                        match         = corner_confidence9(box_gt[:18], torch.FloatTensor(boxes[j][:18]))\n",
+    "                    \n",
+    "                # Denormalize the corner predictions \n",
+    "                corners2D_gt = np.array(np.reshape(box_gt[:18], [9, 2]), dtype='float32')\n",
+    "                corners2D_pr = np.array(np.reshape(box_pr[:18], [9, 2]), dtype='float32')\n",
+    "                corners2D_gt[:, 0] = corners2D_gt[:, 0] * 640\n",
+    "                corners2D_gt[:, 1] = corners2D_gt[:, 1] * 480               \n",
+    "                corners2D_pr[:, 0] = corners2D_pr[:, 0] * 640\n",
+    "                corners2D_pr[:, 1] = corners2D_pr[:, 1] * 480\n",
+    "                corners2D_gt_corrected = fix_corner_order(corners2D_gt) # Fix the order of the corners in OCCLUSION\n",
+    "                \n",
+    "                # Compute [R|t] by pnp\n",
+    "                objpoints3D = np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32')\n",
+    "                K = np.array(internal_calibration, dtype='float32')\n",
+    "                R_gt, t_gt = pnp(objpoints3D,  corners2D_gt_corrected, K)\n",
+    "                R_pr, t_pr = pnp(objpoints3D,  corners2D_pr, K)\n",
+    "                \n",
+    "                # Compute pixel error\n",
+    "                Rt_gt        = np.concatenate((R_gt, t_gt), axis=1)\n",
+    "                Rt_pr        = np.concatenate((R_pr, t_pr), axis=1)\n",
+    "                proj_2d_gt   = compute_projection(vertices, Rt_gt, internal_calibration) \n",
+    "                proj_2d_pred = compute_projection(vertices, Rt_pr, internal_calibration) \n",
+    "                proj_corners_gt = np.transpose(compute_projection(corners3D, Rt_gt, internal_calibration)) \n",
+    "                proj_corners_pr = np.transpose(compute_projection(corners3D, Rt_pr, internal_calibration)) \n",
+    "                norm         = np.linalg.norm(proj_2d_gt - proj_2d_pred, axis=0)\n",
+    "                pixel_dist   = np.mean(norm)\n",
+    "                errs_2d.append(pixel_dist)\n",
+    "\n",
+    "                \n",
+    "                if visualize:\n",
+    "                    # Visualize\n",
+    "                    plt.xlim((0, 640))\n",
+    "                    plt.ylim((0, 480))\n",
+    "                    plt.imshow(scipy.misc.imresize(img, (480, 640)))\n",
+    "                    # Projections\n",
+    "                    for edge in edges_corners:\n",
+    "                        plt.plot(proj_corners_gt[edge, 0], proj_corners_gt[edge, 1], color='g', linewidth=3.0)\n",
+    "                        plt.plot(proj_corners_pr[edge, 0], proj_corners_pr[edge, 1], color='b', linewidth=3.0)\n",
+    "                    plt.gca().invert_yaxis()\n",
+    "                    plt.show()\n",
+    "\n",
+    "        t5 = time.time()\n",
+    "\n",
+    "    # Compute 2D projection score\n",
+    "    for px_threshold in [5, 10, 15, 20, 25, 30, 35, 40, 45, 50]:\n",
+    "        acc = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)\n",
+    "        # Print test statistics\n",
+    "        logging('   Acc using {} px 2D Projection = {:.2f}%'.format(px_threshold, acc))\n",
+    "\n",
+    "conf_th = 0.05\n",
+    "cfgfile = 'cfg/yolo-pose-multi.cfg'\n",
+    "weightfile = 'backup_multi/model_backup2.weights'\n",
+    "datacfg = 'cfg/ape_occlusion.data'\n",
+    "valid(datacfg, cfgfile, weightfile, conf_th)\n",
+    "datacfg = 'cfg/can_occlusion.data'\n",
+    "valid(datacfg, cfgfile, weightfile, conf_th)\n",
+    "datacfg = 'cfg/cat_occlusion.data'\n",
+    "valid(datacfg, cfgfile, weightfile, conf_th)\n",
+    "datacfg = 'cfg/duck_occlusion.data'\n",
+    "valid(datacfg, cfgfile, weightfile, conf_th)\n",
+    "datacfg = 'cfg/driller_occlusion.data'\n",
+    "valid(datacfg, cfgfile, weightfile, conf_th)\n",
+    "datacfg = 'cfg/glue_occlusion.data'\n",
+    "valid(datacfg, cfgfile, weightfile, conf_th)\n",
+    "datacfg = 'cfg/holepuncher_occlusion.data'\n",
+    "valid(datacfg, cfgfile, weightfile, conf_th)\n",
+    "\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/py2/multi_obj_pose_estimation/valid_multi.py
+++ b/py2/multi_obj_pose_estimation/valid_multi.py
@ -0,0 +1,183 @@
+import os
+os.sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+import torch
+from torch.autograd import Variable
+from torchvision import datasets, transforms
+import matplotlib.pyplot as plt
+import scipy.misc
+import warnings
+warnings.filterwarnings("ignore")
+
+from darknet_multi import Darknet
+from utils import *
+import dataset_multi
+from MeshPly import MeshPly
+
+def valid(datacfg, cfgfile, weightfile, conf_th):
+    def truths_length(truths):
+        for i in range(50):
+            if truths[i][1] == 0:
+                return i
+
+    # Parse configuration files
+    options       = read_data_cfg(datacfg)
+    valid_images  = options['valid']
+    meshname      = options['mesh']
+    name          = options['name']
+    prefix        = 'results'
+    # Read object model information, get 3D bounding box corners
+    mesh          = MeshPly(meshname)
+    vertices      = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()
+    corners3D     = get_3D_corners(vertices)
+    diam          = float(options['diam'])
+
+    # Read intrinsic camera parameters
+    internal_calibration = get_camera_intrinsic()
+
+    # Get validation file names
+    with open(valid_images) as fp:
+        tmp_files = fp.readlines()
+        valid_files = [item.rstrip() for item in tmp_files]
+    
+    # Specicy model, load pretrained weights, pass to GPU and set the module in evaluation mode
+    model = Darknet(cfgfile)
+    model.load_weights(weightfile)
+    model.cuda()
+    model.eval()
+
+    # Get the parser for the test dataset
+    valid_dataset = dataset_multi.listDataset(valid_images, shape=(model.width, model.height),
+                       shuffle=False,
+                       objclass=name,
+                       transform=transforms.Compose([
+                           transforms.ToTensor(),
+                       ]))
+    valid_batchsize = 1
+
+    # Specify the number of workers for multiple processing, get the dataloader for the test dataset
+    kwargs = {'num_workers': 4, 'pin_memory': True}
+    test_loader = torch.utils.data.DataLoader(
+        valid_dataset, batch_size=valid_batchsize, shuffle=False, **kwargs) 
+
+    # Parameters
+    use_cuda        = True
+    num_classes     = 13
+    anchors         = [1.4820, 2.2412, 2.0501, 3.1265, 2.3946, 4.6891, 3.1018, 3.9910, 3.4879, 5.8851] 
+    num_anchors     = 5
+    eps             = 1e-5
+    conf_thresh     = conf_th
+    iou_thresh      = 0.5
+
+    # Parameters to save
+    errs_2d             = []
+    edges = [[1, 2], [1, 3], [1, 5], [2, 4], [2, 6], [3, 4], [3, 7], [4, 8], [5, 6], [5, 7], [6, 8], [7, 8]]
+    edges_corners = [[0, 1], [0, 2], [0, 4], [1, 3], [1, 5], [2, 3], [2, 6], [3, 7], [4, 5], [4, 6], [5, 7], [6, 7]]
+
+    # Iterate through test batches (Batch size for test data is 1)
+    logging('Testing {}...'.format(name))
+    for batch_idx, (data, target) in enumerate(test_loader):
+        
+        t1 = time.time()
+        # Pass data to GPU
+        if use_cuda:
+            data = data.cuda()
+            # target = target.cuda()
+        
+        # Wrap tensors in Variable class, set volatile=True for inference mode and to use minimal memory during inference
+        data = Variable(data, volatile=True)
+        t2 = time.time()
+        
+        # Forward pass
+        output = model(data).data  
+        t3 = time.time()
+        
+        # Using confidence threshold, eliminate low-confidence predictions
+        trgt = target[0].view(-1, 21)
+        all_boxes = get_corresponding_region_boxes(output, conf_thresh, num_classes, anchors, num_anchors, int(trgt[0][0]), only_objectness=0)        
+        t4 = time.time()
+        
+        # Iterate through all images in the batch
+        for i in range(output.size(0)):
+            
+            # For each image, get all the predictions
+            boxes   = all_boxes[i]
+            
+            # For each image, get all the targets (for multiple object pose estimation, there might be more than 1 target per image)
+            truths  = target[i].view(-1, 21)
+            
+            # Get how many object are present in the scene
+            num_gts = truths_length(truths)
+
+            # Iterate through each ground-truth object
+            for k in range(num_gts):
+                box_gt        = [truths[k][1], truths[k][2], truths[k][3], truths[k][4], truths[k][5], truths[k][6], 
+                                truths[k][7], truths[k][8], truths[k][9], truths[k][10], truths[k][11], truths[k][12], 
+                                truths[k][13], truths[k][14], truths[k][15], truths[k][16], truths[k][17], truths[k][18], 1.0, 1.0, truths[k][0]]
+                best_conf_est = -1
+                
+
+                # If the prediction has the highest confidence, choose it as our prediction
+                for j in range(len(boxes)):
+                    if (boxes[j][18] > best_conf_est) and (boxes[j][20] == int(truths[k][0])):
+                        best_conf_est = boxes[j][18]
+                        box_pr        = boxes[j]
+                        bb2d_gt       = get_2d_bb(box_gt[:18], output.size(3))
+                        bb2d_pr       = get_2d_bb(box_pr[:18], output.size(3))
+                        iou           = bbox_iou(bb2d_gt, bb2d_pr)
+                        match         = corner_confidence9(box_gt[:18], torch.FloatTensor(boxes[j][:18]))
+                    
+                # Denormalize the corner predictions 
+                corners2D_gt = np.array(np.reshape(box_gt[:18], [9, 2]), dtype='float32')
+                corners2D_pr = np.array(np.reshape(box_pr[:18], [9, 2]), dtype='float32')
+                corners2D_gt[:, 0] = corners2D_gt[:, 0] * 640
+                corners2D_gt[:, 1] = corners2D_gt[:, 1] * 480               
+                corners2D_pr[:, 0] = corners2D_pr[:, 0] * 640
+                corners2D_pr[:, 1] = corners2D_pr[:, 1] * 480
+                corners2D_gt_corrected = fix_corner_order(corners2D_gt) # Fix the order of corners
+                
+                # Compute [R|t] by pnp
+                objpoints3D = np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32')
+                K = np.array(internal_calibration, dtype='float32')
+                R_gt, t_gt = pnp(objpoints3D,  corners2D_gt_corrected, K)
+                R_pr, t_pr = pnp(objpoints3D,  corners2D_pr, K)
+                
+                # Compute pixel error
+                Rt_gt        = np.concatenate((R_gt, t_gt), axis=1)
+                Rt_pr        = np.concatenate((R_pr, t_pr), axis=1)
+                proj_2d_gt   = compute_projection(vertices, Rt_gt, internal_calibration) 
+                proj_2d_pred = compute_projection(vertices, Rt_pr, internal_calibration) 
+                proj_corners_gt = np.transpose(compute_projection(corners3D, Rt_gt, internal_calibration)) 
+                proj_corners_pr = np.transpose(compute_projection(corners3D, Rt_pr, internal_calibration)) 
+                norm         = np.linalg.norm(proj_2d_gt - proj_2d_pred, axis=0)
+                pixel_dist   = np.mean(norm)
+                errs_2d.append(pixel_dist)
+
+        t5 = time.time()
+
+    # Compute 2D projection score
+    for px_threshold in [5, 10, 15, 20, 25, 30, 35, 40, 45, 50]:
+        acc = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)
+        # Print test statistics
+        logging('   Acc using {} px 2D Projection = {:.2f}%'.format(px_threshold, acc))
+
+if __name__ == '__main__' and __package__ is None:
+    import sys
+    if len(sys.argv) == 3:
+        conf_th = 0.05
+        cfgfile = sys.argv[1]
+        weightfile = sys.argv[2]
+        datacfg = 'cfg/ape_occlusion.data'
+        valid(datacfg, cfgfile, weightfile, conf_th)
+        datacfg = 'cfg/can_occlusion.data'
+        valid(datacfg, cfgfile, weightfile, conf_th)
+        datacfg = 'cfg/cat_occlusion.data'
+        valid(datacfg, cfgfile, weightfile, conf_th)
+        datacfg = 'cfg/duck_occlusion.data'
+        valid(datacfg, cfgfile, weightfile, conf_th)
+        datacfg = 'cfg/glue_occlusion.data'
+        valid(datacfg, cfgfile, weightfile, conf_th)
+        datacfg = 'cfg/holepuncher_occlusion.data'
+        valid(datacfg, cfgfile, weightfile, conf_th)
+    else:
+        print('Usage:')
+        print(' python valid.py cfgfile weightfile')
--- a/py2/region_loss.py
+++ b/py2/region_loss.py
@ -0,0 +1,301 @@
+import time
+import torch
+import math
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.autograd import Variable
+from utils import *
+
+def build_targets(pred_corners, target, anchors, num_anchors, num_classes, nH, nW, noobject_scale, object_scale, sil_thresh, seen):
+    nB = target.size(0)
+    nA = num_anchors
+    nC = num_classes
+    anchor_step = len(anchors)/num_anchors
+    conf_mask   = torch.ones(nB, nA, nH, nW) * noobject_scale
+    coord_mask  = torch.zeros(nB, nA, nH, nW)
+    cls_mask    = torch.zeros(nB, nA, nH, nW)
+    tx0         = torch.zeros(nB, nA, nH, nW) 
+    ty0         = torch.zeros(nB, nA, nH, nW) 
+    tx1         = torch.zeros(nB, nA, nH, nW) 
+    ty1         = torch.zeros(nB, nA, nH, nW) 
+    tx2         = torch.zeros(nB, nA, nH, nW) 
+    ty2         = torch.zeros(nB, nA, nH, nW) 
+    tx3         = torch.zeros(nB, nA, nH, nW) 
+    ty3         = torch.zeros(nB, nA, nH, nW) 
+    tx4         = torch.zeros(nB, nA, nH, nW) 
+    ty4         = torch.zeros(nB, nA, nH, nW) 
+    tx5         = torch.zeros(nB, nA, nH, nW) 
+    ty5         = torch.zeros(nB, nA, nH, nW) 
+    tx6         = torch.zeros(nB, nA, nH, nW) 
+    ty6         = torch.zeros(nB, nA, nH, nW) 
+    tx7         = torch.zeros(nB, nA, nH, nW) 
+    ty7         = torch.zeros(nB, nA, nH, nW) 
+    tx8         = torch.zeros(nB, nA, nH, nW) 
+    ty8         = torch.zeros(nB, nA, nH, nW) 
+    tconf       = torch.zeros(nB, nA, nH, nW)
+    tcls        = torch.zeros(nB, nA, nH, nW) 
+
+    nAnchors = nA*nH*nW
+    nPixels  = nH*nW
+    for b in range(nB):
+        cur_pred_corners = pred_corners[b*nAnchors:(b+1)*nAnchors].t()
+        cur_confs = torch.zeros(nAnchors)
+        for t in range(50):
+            if target[b][t*21+1] == 0:
+                break
+            gx0 = target[b][t*21+1]*nW
+            gy0 = target[b][t*21+2]*nH
+            gx1 = target[b][t*21+3]*nW
+            gy1 = target[b][t*21+4]*nH
+            gx2 = target[b][t*21+5]*nW
+            gy2 = target[b][t*21+6]*nH
+            gx3 = target[b][t*21+7]*nW
+            gy3 = target[b][t*21+8]*nH
+            gx4 = target[b][t*21+9]*nW
+            gy4 = target[b][t*21+10]*nH
+            gx5 = target[b][t*21+11]*nW
+            gy5 = target[b][t*21+12]*nH
+            gx6 = target[b][t*21+13]*nW
+            gy6 = target[b][t*21+14]*nH
+            gx7 = target[b][t*21+15]*nW
+            gy7 = target[b][t*21+16]*nH
+            gx8 = target[b][t*21+17]*nW
+            gy8 = target[b][t*21+18]*nH
+
+            cur_gt_corners = torch.FloatTensor([gx0/nW,gy0/nH,gx1/nW,gy1/nH,gx2/nW,gy2/nH,gx3/nW,gy3/nH,gx4/nW,gy4/nH,gx5/nW,gy5/nH,gx6/nW,gy6/nH,gx7/nW,gy7/nH,gx8/nW,gy8/nH]).repeat(nAnchors,1).t() # 16 x nAnchors
+            cur_confs  = torch.max(cur_confs, corner_confidences9(cur_pred_corners, cur_gt_corners)) # some irrelevant areas are filtered, in the same grid multiple anchor boxes might exceed the threshold
+        conf_mask[b][cur_confs>sil_thresh] = 0
+    if seen < -1:#6400:
+       tx0.fill_(0.5)
+       ty0.fill_(0.5)
+       tx1.fill_(0.5)
+       ty1.fill_(0.5)
+       tx2.fill_(0.5)
+       ty2.fill_(0.5)
+       tx3.fill_(0.5)
+       ty3.fill_(0.5)
+       tx4.fill_(0.5)
+       ty4.fill_(0.5)
+       tx5.fill_(0.5)
+       ty5.fill_(0.5)
+       tx6.fill_(0.5)
+       ty6.fill_(0.5)
+       tx7.fill_(0.5)
+       ty7.fill_(0.5)
+       tx8.fill_(0.5)
+       ty8.fill_(0.5)
+       coord_mask.fill_(1)
+
+    nGT = 0
+    nCorrect = 0
+    for b in range(nB):
+        for t in range(50):
+            if target[b][t*21+1] == 0:
+                break
+            nGT = nGT + 1
+            best_iou = 0.0
+            best_n = -1
+            min_dist = 10000
+            gx0 = target[b][t*21+1] * nW
+            gy0 = target[b][t*21+2] * nH
+            gi0 = int(gx0)
+            gj0 = int(gy0)
+            gx1 = target[b][t*21+3] * nW
+            gy1 = target[b][t*21+4] * nH
+            gx2 = target[b][t*21+5] * nW
+            gy2 = target[b][t*21+6] * nH
+            gx3 = target[b][t*21+7] * nW
+            gy3 = target[b][t*21+8] * nH
+            gx4 = target[b][t*21+9] * nW
+            gy4 = target[b][t*21+10] * nH
+            gx5 = target[b][t*21+11] * nW
+            gy5 = target[b][t*21+12] * nH
+            gx6 = target[b][t*21+13] * nW
+            gy6 = target[b][t*21+14] * nH
+            gx7 = target[b][t*21+15] * nW
+            gy7 = target[b][t*21+16] * nH
+            gx8 = target[b][t*21+17] * nW
+            gy8 = target[b][t*21+18] * nH
+
+            best_n = 0 # 1 anchor box
+            gt_box = [gx0/nW,gy0/nH,gx1/nW,gy1/nH,gx2/nW,gy2/nH,gx3/nW,gy3/nH,gx4/nW,gy4/nH,gx5/nW,gy5/nH,gx6/nW,gy6/nH,gx7/nW,gy7/nH,gx8/nW,gy8/nH]
+            pred_box = pred_corners[b*nAnchors+best_n*nPixels+gj0*nW+gi0]
+            conf = corner_confidence9(gt_box, pred_box) 
+            coord_mask[b][best_n][gj0][gi0] = 1
+            cls_mask[b][best_n][gj0][gi0]   = 1
+            conf_mask[b][best_n][gj0][gi0]  = object_scale
+            tx0[b][best_n][gj0][gi0]        = target[b][t*21+1] * nW - gi0
+            ty0[b][best_n][gj0][gi0]        = target[b][t*21+2] * nH - gj0
+            tx1[b][best_n][gj0][gi0]        = target[b][t*21+3] * nW - gi0
+            ty1[b][best_n][gj0][gi0]        = target[b][t*21+4] * nH - gj0
+            tx2[b][best_n][gj0][gi0]        = target[b][t*21+5] * nW - gi0
+            ty2[b][best_n][gj0][gi0]        = target[b][t*21+6] * nH - gj0
+            tx3[b][best_n][gj0][gi0]        = target[b][t*21+7] * nW - gi0
+            ty3[b][best_n][gj0][gi0]        = target[b][t*21+8] * nH - gj0
+            tx4[b][best_n][gj0][gi0]        = target[b][t*21+9] * nW - gi0
+            ty4[b][best_n][gj0][gi0]        = target[b][t*21+10] * nH - gj0
+            tx5[b][best_n][gj0][gi0]        = target[b][t*21+11] * nW - gi0
+            ty5[b][best_n][gj0][gi0]        = target[b][t*21+12] * nH - gj0
+            tx6[b][best_n][gj0][gi0]        = target[b][t*21+13] * nW - gi0
+            ty6[b][best_n][gj0][gi0]        = target[b][t*21+14] * nH - gj0
+            tx7[b][best_n][gj0][gi0]        = target[b][t*21+15] * nW - gi0
+            ty7[b][best_n][gj0][gi0]        = target[b][t*21+16] * nH - gj0
+            tx8[b][best_n][gj0][gi0]        = target[b][t*21+17] * nW - gi0
+            ty8[b][best_n][gj0][gi0]        = target[b][t*21+18] * nH - gj0
+            tconf[b][best_n][gj0][gi0]      = conf
+            tcls[b][best_n][gj0][gi0]       = target[b][t*21]
+
+            if conf > 0.5:
+                nCorrect = nCorrect + 1
+
+    return nGT, nCorrect, coord_mask, conf_mask, cls_mask, tx0, tx1, tx2, tx3, tx4, tx5, tx6, tx7, tx8, ty0, ty1, ty2, ty3, ty4, ty5, ty6, ty7, ty8, tconf, tcls
+           
+class RegionLoss(nn.Module):
+    def __init__(self, num_classes=0, anchors=[], num_anchors=1):
+        super(RegionLoss, self).__init__()
+        self.num_classes = num_classes
+        self.anchors = anchors
+        self.num_anchors = num_anchors
+        self.anchor_step = len(anchors)/num_anchors
+        self.coord_scale = 1
+        self.noobject_scale = 1
+        self.object_scale = 5
+        self.class_scale = 1
+        self.thresh = 0.6
+        self.seen = 0
+
+    def forward(self, output, target):
+        # Parameters
+        t0 = time.time()
+        nB = output.data.size(0)
+        nA = self.num_anchors
+        nC = self.num_classes
+        nH = output.data.size(2)
+        nW = output.data.size(3)
+
+        # Activation
+        output = output.view(nB, nA, (19+nC), nH, nW)
+        x0     = F.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([0]))).view(nB, nA, nH, nW))
+        y0     = F.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([1]))).view(nB, nA, nH, nW))
+        x1     = output.index_select(2, Variable(torch.cuda.LongTensor([2]))).view(nB, nA, nH, nW)
+        y1     = output.index_select(2, Variable(torch.cuda.LongTensor([3]))).view(nB, nA, nH, nW)
+        x2     = output.index_select(2, Variable(torch.cuda.LongTensor([4]))).view(nB, nA, nH, nW)
+        y2     = output.index_select(2, Variable(torch.cuda.LongTensor([5]))).view(nB, nA, nH, nW)
+        x3     = output.index_select(2, Variable(torch.cuda.LongTensor([6]))).view(nB, nA, nH, nW)
+        y3     = output.index_select(2, Variable(torch.cuda.LongTensor([7]))).view(nB, nA, nH, nW)
+        x4     = output.index_select(2, Variable(torch.cuda.LongTensor([8]))).view(nB, nA, nH, nW)
+        y4     = output.index_select(2, Variable(torch.cuda.LongTensor([9]))).view(nB, nA, nH, nW)
+        x5     = output.index_select(2, Variable(torch.cuda.LongTensor([10]))).view(nB, nA, nH, nW)
+        y5     = output.index_select(2, Variable(torch.cuda.LongTensor([11]))).view(nB, nA, nH, nW)
+        x6     = output.index_select(2, Variable(torch.cuda.LongTensor([12]))).view(nB, nA, nH, nW)
+        y6     = output.index_select(2, Variable(torch.cuda.LongTensor([13]))).view(nB, nA, nH, nW)
+        x7     = output.index_select(2, Variable(torch.cuda.LongTensor([14]))).view(nB, nA, nH, nW)
+        y7     = output.index_select(2, Variable(torch.cuda.LongTensor([15]))).view(nB, nA, nH, nW)
+        x8     = output.index_select(2, Variable(torch.cuda.LongTensor([16]))).view(nB, nA, nH, nW)
+        y8     = output.index_select(2, Variable(torch.cuda.LongTensor([17]))).view(nB, nA, nH, nW)
+        conf   = F.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([18]))).view(nB, nA, nH, nW))
+        cls    = output.index_select(2, Variable(torch.linspace(19,19+nC-1,nC).long().cuda()))
+        cls    = cls.view(nB*nA, nC, nH*nW).transpose(1,2).contiguous().view(nB*nA*nH*nW, nC)
+        t1     = time.time()
+
+        # Create pred boxes
+        pred_corners = torch.cuda.FloatTensor(18, nB*nA*nH*nW)
+        grid_x = torch.linspace(0, nW-1, nW).repeat(nH,1).repeat(nB*nA, 1, 1).view(nB*nA*nH*nW).cuda()
+        grid_y = torch.linspace(0, nH-1, nH).repeat(nW,1).t().repeat(nB*nA, 1, 1).view(nB*nA*nH*nW).cuda()
+        pred_corners[0]  = (x0.data + grid_x) / nW
+        pred_corners[1]  = (y0.data + grid_y) / nH
+        pred_corners[2]  = (x1.data + grid_x) / nW
+        pred_corners[3]  = (y1.data + grid_y) / nH
+        pred_corners[4]  = (x2.data + grid_x) / nW
+        pred_corners[5]  = (y2.data + grid_y) / nH
+        pred_corners[6]  = (x3.data + grid_x) / nW
+        pred_corners[7]  = (y3.data + grid_y) / nH
+        pred_corners[8]  = (x4.data + grid_x) / nW
+        pred_corners[9]  = (y4.data + grid_y) / nH
+        pred_corners[10]  = (x5.data + grid_x) / nW
+        pred_corners[11]  = (y5.data + grid_y) / nH
+        pred_corners[12] = (x6.data + grid_x) / nW
+        pred_corners[13] = (y6.data + grid_y) / nH
+        pred_corners[14] = (x7.data + grid_x) / nW
+        pred_corners[15] = (y7.data + grid_y) / nH
+        pred_corners[16] = (x8.data + grid_x) / nW
+        pred_corners[17] = (y8.data + grid_y) / nH
+        gpu_matrix = pred_corners.transpose(0,1).contiguous().view(-1,18)
+        pred_corners = convert2cpu(gpu_matrix)
+        t2 = time.time()
+
+        # Build targets
+        nGT, nCorrect, coord_mask, conf_mask, cls_mask, tx0, tx1, tx2, tx3, tx4, tx5, tx6, tx7, tx8, ty0, ty1, ty2, ty3, ty4, ty5, ty6, ty7, ty8, tconf, tcls = \
+                       build_targets(pred_corners, target.data, self.anchors, nA, nC, nH, nW, self.noobject_scale, self.object_scale, self.thresh, self.seen)
+        cls_mask   = (cls_mask == 1)
+        nProposals = int((conf > 0.25).sum().data[0])
+        tx0        = Variable(tx0.cuda())
+        ty0        = Variable(ty0.cuda())
+        tx1        = Variable(tx1.cuda())
+        ty1        = Variable(ty1.cuda())
+        tx2        = Variable(tx2.cuda())
+        ty2        = Variable(ty2.cuda())
+        tx3        = Variable(tx3.cuda())
+        ty3        = Variable(ty3.cuda())
+        tx4        = Variable(tx4.cuda())
+        ty4        = Variable(ty4.cuda())
+        tx5        = Variable(tx5.cuda())
+        ty5        = Variable(ty5.cuda())
+        tx6        = Variable(tx6.cuda())
+        ty6        = Variable(ty6.cuda())
+        tx7        = Variable(tx7.cuda())
+        ty7        = Variable(ty7.cuda())
+        tx8        = Variable(tx8.cuda())
+        ty8        = Variable(ty8.cuda())
+        tconf      = Variable(tconf.cuda())
+        tcls       = Variable(tcls.view(-1)[cls_mask].long().cuda())
+        coord_mask = Variable(coord_mask.cuda())
+        conf_mask  = Variable(conf_mask.cuda().sqrt())
+        cls_mask   = Variable(cls_mask.view(-1, 1).repeat(1,nC).cuda())
+        cls        = cls[cls_mask].view(-1, nC)  
+        t3 = time.time()
+
+        # Create loss
+        loss_x0    = self.coord_scale * nn.MSELoss(size_average=False)(x0*coord_mask, tx0*coord_mask)/2.0
+        loss_y0    = self.coord_scale * nn.MSELoss(size_average=False)(y0*coord_mask, ty0*coord_mask)/2.0
+        loss_x1    = self.coord_scale * nn.MSELoss(size_average=False)(x1*coord_mask, tx1*coord_mask)/2.0
+        loss_y1    = self.coord_scale * nn.MSELoss(size_average=False)(y1*coord_mask, ty1*coord_mask)/2.0
+        loss_x2    = self.coord_scale * nn.MSELoss(size_average=False)(x2*coord_mask, tx2*coord_mask)/2.0
+        loss_y2    = self.coord_scale * nn.MSELoss(size_average=False)(y2*coord_mask, ty2*coord_mask)/2.0
+        loss_x3    = self.coord_scale * nn.MSELoss(size_average=False)(x3*coord_mask, tx3*coord_mask)/2.0
+        loss_y3    = self.coord_scale * nn.MSELoss(size_average=False)(y3*coord_mask, ty3*coord_mask)/2.0
+        loss_x4    = self.coord_scale * nn.MSELoss(size_average=False)(x4*coord_mask, tx4*coord_mask)/2.0
+        loss_y4    = self.coord_scale * nn.MSELoss(size_average=False)(y4*coord_mask, ty4*coord_mask)/2.0
+        loss_x5    = self.coord_scale * nn.MSELoss(size_average=False)(x5*coord_mask, tx5*coord_mask)/2.0
+        loss_y5    = self.coord_scale * nn.MSELoss(size_average=False)(y5*coord_mask, ty5*coord_mask)/2.0
+        loss_x6    = self.coord_scale * nn.MSELoss(size_average=False)(x6*coord_mask, tx6*coord_mask)/2.0
+        loss_y6    = self.coord_scale * nn.MSELoss(size_average=False)(y6*coord_mask, ty6*coord_mask)/2.0
+        loss_x7    = self.coord_scale * nn.MSELoss(size_average=False)(x7*coord_mask, tx7*coord_mask)/2.0
+        loss_y7    = self.coord_scale * nn.MSELoss(size_average=False)(y7*coord_mask, ty7*coord_mask)/2.0
+        loss_x8    = self.coord_scale * nn.MSELoss(size_average=False)(x8*coord_mask, tx8*coord_mask)/2.0
+        loss_y8    = self.coord_scale * nn.MSELoss(size_average=False)(y8*coord_mask, ty8*coord_mask)/2.0
+        loss_conf  = nn.MSELoss(size_average=False)(conf*conf_mask, tconf*conf_mask)/2.0
+        # loss_cls   = self.class_scale * nn.CrossEntropyLoss(size_average=False)(cls, tcls)
+        loss_cls = 0
+        loss_x     = loss_x0 + loss_x1 + loss_x2 + loss_x3 + loss_x4 + loss_x5 + loss_x6 + loss_x7 + loss_x8 
+        loss_y     = loss_y0 + loss_y1 + loss_y2 + loss_y3 + loss_y4 + loss_y5 + loss_y6 + loss_y7 + loss_y8 
+        if False:
+            loss   = loss_x + loss_y + loss_conf + loss_cls
+        else:
+            loss   = loss_x + loss_y + loss_conf
+        t4 = time.time()
+
+        if False:
+            print('-----------------------------------')
+            print('        activation : %f' % (t1 - t0))
+            print(' create pred_corners : %f' % (t2 - t1))
+            print('     build targets : %f' % (t3 - t2))
+            print('       create loss : %f' % (t4 - t3))
+            print('             total : %f' % (t4 - t0))
+
+        if False:
+            print('%d: nGT %d, recall %d, proposals %d, loss: x %f, y %f, conf %f, cls %f, total %f' % (self.seen, nGT, nCorrect, nProposals, loss_x.data[0], loss_y.data[0], loss_conf.data[0], loss_cls.data[0], loss.data[0]))
+        else:
+            print('%d: nGT %d, recall %d, proposals %d, loss: x %f, y %f, conf %f, total %f' % (self.seen, nGT, nCorrect, nProposals, loss_x.data[0], loss_y.data[0], loss_conf.data[0], loss.data[0]))
+        
+        return loss
--- a/py2/train.py
+++ b/py2/train.py
@ -0,0 +1,417 @@
+from __future__ import print_function
+import sys
+import time
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+import torch.backends.cudnn as cudnn
+import numpy as np
+import os
+import random
+import math
+import shutil
+from torchvision import datasets, transforms
+from torch.autograd import Variable # Useful info about autograd: http://pytorch.org/docs/master/notes/autograd.html
+
+import dataset
+from utils import *    
+from cfg import parse_cfg
+from region_loss import RegionLoss
+from darknet import Darknet
+from MeshPly import MeshPly
+
+# Create new directory
+def makedirs(path):
+    if not os.path.exists( path ):
+        os.makedirs( path )
+
+# Adjust learning rate during training, learning schedule can be changed in network config file
+def adjust_learning_rate(optimizer, batch):
+    lr = learning_rate
+    for i in range(len(steps)):
+        scale = scales[i] if i < len(scales) else 1
+        if batch >= steps[i]:
+            lr = lr * scale
+            if batch == steps[i]:
+                break
+        else:
+            break
+    for param_group in optimizer.param_groups:
+        param_group['lr'] = lr/batch_size
+    return lr
+
+def train(epoch):
+
+    global processed_batches
+    
+    # Initialize timer
+    t0 = time.time()
+
+    # Get the dataloader for training dataset
+    train_loader = torch.utils.data.DataLoader(dataset.listDataset(trainlist, shape=(init_width, init_height),
+                                                            shuffle=True,
+                                                            transform=transforms.Compose([transforms.ToTensor(),]), 
+                                                            train=True, 
+                                                            seen=model.seen,
+                                                            batch_size=batch_size,
+                                                            num_workers=num_workers, bg_file_names=bg_file_names),
+                                                batch_size=batch_size, shuffle=False, **kwargs)
+
+    # TRAINING
+    lr = adjust_learning_rate(optimizer, processed_batches)
+    logging('epoch %d, processed %d samples, lr %f' % (epoch, epoch * len(train_loader.dataset), lr))
+    # Start training
+    model.train()
+    t1 = time.time()
+    avg_time = torch.zeros(9)
+    niter = 0
+    # Iterate through batches
+    for batch_idx, (data, target) in enumerate(train_loader):
+        t2 = time.time()
+        # adjust learning rate
+        adjust_learning_rate(optimizer, processed_batches)
+        processed_batches = processed_batches + 1
+        # Pass the data to GPU
+        if use_cuda:
+            data = data.cuda()
+        t3 = time.time()
+        # Wrap tensors in Variable class for automatic differentiation
+        data, target = Variable(data), Variable(target)
+        t4 = time.time()
+        # Zero the gradients before running the backward pass
+        optimizer.zero_grad()
+        t5 = time.time()
+        # Forward pass
+        output = model(data)
+        t6 = time.time()
+        model.seen = model.seen + data.data.size(0)
+        region_loss.seen = region_loss.seen + data.data.size(0)
+        # Compute loss, grow an array of losses for saving later on
+        loss = region_loss(output, target)
+        training_iters.append(epoch * math.ceil(len(train_loader.dataset) / float(batch_size) ) + niter)
+        training_losses.append(convert2cpu(loss.data))
+        niter += 1
+        t7 = time.time()
+        # Backprop: compute gradient of the loss with respect to model parameters
+        loss.backward()
+        t8 = time.time()
+        # Update weights
+        optimizer.step()
+        t9 = time.time()
+        # Print time statistics
+        if False and batch_idx > 1:
+            avg_time[0] = avg_time[0] + (t2-t1)
+            avg_time[1] = avg_time[1] + (t3-t2)
+            avg_time[2] = avg_time[2] + (t4-t3)
+            avg_time[3] = avg_time[3] + (t5-t4)
+            avg_time[4] = avg_time[4] + (t6-t5)
+            avg_time[5] = avg_time[5] + (t7-t6)
+            avg_time[6] = avg_time[6] + (t8-t7)
+            avg_time[7] = avg_time[7] + (t9-t8)
+            avg_time[8] = avg_time[8] + (t9-t1)
+            print('-------------------------------')
+            print('       load data : %f' % (avg_time[0]/(batch_idx)))
+            print('     cpu to cuda : %f' % (avg_time[1]/(batch_idx)))
+            print('cuda to variable : %f' % (avg_time[2]/(batch_idx)))
+            print('       zero_grad : %f' % (avg_time[3]/(batch_idx)))
+            print(' forward feature : %f' % (avg_time[4]/(batch_idx)))
+            print('    forward loss : %f' % (avg_time[5]/(batch_idx)))
+            print('        backward : %f' % (avg_time[6]/(batch_idx)))
+            print('            step : %f' % (avg_time[7]/(batch_idx)))
+            print('           total : %f' % (avg_time[8]/(batch_idx)))
+        t1 = time.time()
+    t1 = time.time()
+    return epoch * math.ceil(len(train_loader.dataset) / float(batch_size) ) + niter - 1 
+
+def test(epoch, niter):
+    def truths_length(truths):
+        for i in range(50):
+            if truths[i][1] == 0:
+                return i
+
+    # Set the module in evaluation mode (turn off dropout, batch normalization etc.)        
+    model.eval()
+
+    # Parameters
+    num_classes          = model.num_classes
+    anchors              = model.anchors
+    num_anchors          = model.num_anchors
+    testtime             = True
+    testing_error_trans  = 0.0
+    testing_error_angle  = 0.0
+    testing_error_pixel  = 0.0
+    testing_samples      = 0.0
+    errs_2d              = []
+    errs_3d              = []
+    errs_trans           = []
+    errs_angle           = []
+    errs_corner2D        = []
+
+    logging("   Testing...")
+    logging("   Number of test samples: %d" % len(test_loader.dataset))
+    notpredicted = 0
+    # Iterate through test examples 
+    for batch_idx, (data, target) in enumerate(test_loader):
+        t1 = time.time()
+        # Pass the data to GPU
+        if use_cuda:
+            data = data.cuda()
+            target = target.cuda()
+        # Wrap tensors in Variable class, set volatile=True for inference mode and to use minimal memory during inference
+        data = Variable(data, volatile=True)
+        t2 = time.time()
+        # Formward pass
+        output = model(data).data  
+        t3 = time.time()
+        # Using confidence threshold, eliminate low-confidence predictions
+        all_boxes = get_region_boxes(output, conf_thresh, num_classes, anchors, num_anchors)        
+        t4 = time.time()
+        # Iterate through all batch elements
+        for i in range(output.size(0)):
+            # For each image, get all the predictions
+            boxes   = all_boxes[i]
+            # For each image, get all the targets (for multiple object pose estimation, there might be more than 1 target per image)
+            truths  = target[i].view(-1, 21)
+            # Get how many object are present in the scene
+            num_gts = truths_length(truths)
+
+            # Iterate through each ground-truth object
+            for k in range(num_gts):
+                box_gt        = [truths[k][1], truths[k][2], truths[k][3], truths[k][4], truths[k][5], truths[k][6], 
+                                truths[k][7], truths[k][8], truths[k][9], truths[k][10], truths[k][11], truths[k][12], 
+                                truths[k][13], truths[k][14], truths[k][15], truths[k][16], truths[k][17], truths[k][18], 1.0, 1.0, truths[k][0]]
+                best_conf_est = -1
+
+                # If the prediction has the highest confidence, choose it as our prediction
+                for j in range(len(boxes)):
+                    if boxes[j][18] > best_conf_est:
+                        best_conf_est = boxes[j][18]
+                        box_pr        = boxes[j]
+                        match         = corner_confidence9(box_gt[:18], torch.FloatTensor(boxes[j][:18]))
+
+                # Denormalize the corner predictions 
+                corners2D_gt = np.array(np.reshape(box_gt[:18], [9, 2]), dtype='float32')
+                corners2D_pr = np.array(np.reshape(box_pr[:18], [9, 2]), dtype='float32')
+                corners2D_gt[:, 0] = corners2D_gt[:, 0] * im_width
+                corners2D_gt[:, 1] = corners2D_gt[:, 1] * im_height               
+                corners2D_pr[:, 0] = corners2D_pr[:, 0] * im_width
+                corners2D_pr[:, 1] = corners2D_pr[:, 1] * im_height
+
+                # Compute corner prediction error
+                corner_norm = np.linalg.norm(corners2D_gt - corners2D_pr, axis=1)
+                corner_dist = np.mean(corner_norm)
+                errs_corner2D.append(corner_dist)
+
+                # Compute [R|t] by pnp
+                R_gt, t_gt = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_gt, np.array(internal_calibration, dtype='float32'))
+                R_pr, t_pr = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_pr, np.array(internal_calibration, dtype='float32'))
+
+                # Compute errors
+
+                # Compute translation error
+                trans_dist   = np.sqrt(np.sum(np.square(t_gt - t_pr)))
+                errs_trans.append(trans_dist)
+
+                # Compute angle error
+                angle_dist   = calcAngularDistance(R_gt, R_pr)
+                errs_angle.append(angle_dist)
+
+                # Compute pixel error
+                Rt_gt        = np.concatenate((R_gt, t_gt), axis=1)
+                Rt_pr        = np.concatenate((R_pr, t_pr), axis=1)
+                proj_2d_gt   = compute_projection(vertices, Rt_gt, internal_calibration) 
+                proj_2d_pred = compute_projection(vertices, Rt_pr, internal_calibration) 
+                norm         = np.linalg.norm(proj_2d_gt - proj_2d_pred, axis=0)
+                pixel_dist   = np.mean(norm)
+                errs_2d.append(pixel_dist)
+
+                # Compute 3D distances
+                transform_3d_gt   = compute_transformation(vertices, Rt_gt) 
+                transform_3d_pred = compute_transformation(vertices, Rt_pr)  
+                norm3d            = np.linalg.norm(transform_3d_gt - transform_3d_pred, axis=0)
+                vertex_dist       = np.mean(norm3d)    
+                errs_3d.append(vertex_dist)  
+
+                # Sum errors
+                testing_error_trans  += trans_dist
+                testing_error_angle  += angle_dist
+                testing_error_pixel  += pixel_dist
+                testing_samples      += 1
+
+        t5 = time.time()
+
+    # Compute 2D projection, 6D pose and 5cm5degree scores
+    px_threshold = 5
+    acc = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)
+    acc3d = len(np.where(np.array(errs_3d) <= vx_threshold)[0]) * 100. / (len(errs_3d)+eps)
+    acc5cm5deg = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)
+    corner_acc = len(np.where(np.array(errs_corner2D) <= px_threshold)[0]) * 100. / (len(errs_corner2D)+eps)
+    mean_err_2d = np.mean(errs_2d)
+    mean_corner_err_2d = np.mean(errs_corner2D)
+    nts = float(testing_samples)
+    
+    if testtime:
+        print('-----------------------------------')
+        print('  tensor to cuda : %f' % (t2 - t1))
+        print('         predict : %f' % (t3 - t2))
+        print('get_region_boxes : %f' % (t4 - t3))
+        print('            eval : %f' % (t5 - t4))
+        print('           total : %f' % (t5 - t1))
+        print('-----------------------------------')
+
+    # Print test statistics
+    logging("   Mean corner error is %f" % (mean_corner_err_2d))
+    logging('   Acc using {} px 2D Projection = {:.2f}%'.format(px_threshold, acc))
+    logging('   Acc using {} vx 3D Transformation = {:.2f}%'.format(vx_threshold, acc3d))
+    logging('   Acc using 5 cm 5 degree metric = {:.2f}%'.format(acc5cm5deg))
+    logging('   Translation error: %f, angle error: %f' % (testing_error_trans/(nts+eps), testing_error_angle/(nts+eps)) )
+
+    # Register losses and errors for saving later on
+    testing_iters.append(niter)
+    testing_errors_trans.append(testing_error_trans/(nts+eps))
+    testing_errors_angle.append(testing_error_angle/(nts+eps))
+    testing_errors_pixel.append(testing_error_pixel/(nts+eps))
+    testing_accuracies.append(acc)
+
+if __name__ == "__main__":
+
+    # Training settings
+    datacfg       = sys.argv[1]
+    cfgfile       = sys.argv[2]
+    weightfile    = sys.argv[3]
+
+    # Parse configuration files
+    data_options  = read_data_cfg(datacfg)
+    net_options   = parse_cfg(cfgfile)[0]
+    trainlist     = data_options['train']
+    testlist      = data_options['valid']
+    nsamples      = file_lines(trainlist)
+    gpus          = data_options['gpus']  # e.g. 0,1,2,3
+    gpus = '0'
+    meshname      = data_options['mesh']
+    num_workers   = int(data_options['num_workers'])
+    backupdir     = data_options['backup']
+    diam          = float(data_options['diam'])
+    vx_threshold  = diam * 0.1
+    if not os.path.exists(backupdir):
+        makedirs(backupdir)
+    batch_size    = int(net_options['batch'])
+    max_batches   = int(net_options['max_batches'])
+    learning_rate = float(net_options['learning_rate'])
+    momentum      = float(net_options['momentum'])
+    decay         = float(net_options['decay'])
+    steps         = [float(step) for step in net_options['steps'].split(',')]
+    scales        = [float(scale) for scale in net_options['scales'].split(',')]
+    bg_file_names = get_all_files('VOCdevkit/VOC2012/JPEGImages')
+
+    # Train parameters
+    max_epochs    = 700 # max_batches*batch_size/nsamples+1
+    use_cuda      = True
+    seed          = int(time.time())
+    eps           = 1e-5
+    save_interval = 10 # epoches
+    dot_interval  = 70 # batches
+    best_acc      = -1 
+
+    # Test parameters
+    conf_thresh   = 0.1
+    nms_thresh    = 0.4
+    iou_thresh    = 0.5
+    im_width      = 640
+    im_height     = 480 
+
+    # Specify which gpus to use
+    torch.manual_seed(seed)
+    if use_cuda:
+        os.environ['CUDA_VISIBLE_DEVICES'] = gpus
+        torch.cuda.manual_seed(seed)
+
+    # Specifiy the model and the loss
+    model       = Darknet(cfgfile)
+    region_loss = model.loss
+
+    # Model settings
+    # model.load_weights(weightfile)
+    model.load_weights_until_last(weightfile) 
+    model.print_network()
+    model.seen = 0
+    region_loss.iter  = model.iter
+    region_loss.seen  = model.seen
+    processed_batches = model.seen/batch_size
+    init_width        = model.width
+    init_height       = model.height
+    test_width        = 672
+    test_height       = 672
+    init_epoch        = model.seen/nsamples 
+
+    # Variable to save
+    training_iters          = []
+    training_losses         = []
+    testing_iters           = []
+    testing_losses          = []
+    testing_errors_trans    = []
+    testing_errors_angle    = []
+    testing_errors_pixel    = []
+    testing_accuracies      = []
+
+    # Get the intrinsic camerea matrix, mesh, vertices and corners of the model
+    mesh                 = MeshPly(meshname)
+    vertices             = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()
+    corners3D            = get_3D_corners(vertices)
+    internal_calibration = get_camera_intrinsic()
+
+    print("vertex unit")
+    print(np.max(vertices))
+    print(np.min(vertices))
+
+    # Specify the number of workers
+    kwargs = {'num_workers': num_workers, 'pin_memory': True} if use_cuda else {}
+
+    # Get the dataloader for test data
+    test_loader = torch.utils.data.DataLoader(dataset.listDataset(testlist, shape=(test_width, test_height),
+                                                                           shuffle=False,
+                                                                           transform=transforms.Compose([transforms.ToTensor(),]), 
+                                                                           train=False),
+                                             batch_size=1, shuffle=False, **kwargs)
+
+    # Pass the model to GPU
+    if use_cuda:
+        model = model.cuda() # model = torch.nn.DataParallel(model, device_ids=[0]).cuda() # Multiple GPU parallelism
+
+    # Get the optimizer
+    params_dict = dict(model.named_parameters())
+    params = []
+    for key, value in params_dict.items():
+        if key.find('.bn') >= 0 or key.find('.bias') >= 0:
+            params += [{'params': [value], 'weight_decay': 0.0}]
+        else:
+            params += [{'params': [value], 'weight_decay': decay*batch_size}]
+    optimizer = optim.SGD(model.parameters(), lr=learning_rate/batch_size, momentum=momentum, dampening=0, weight_decay=decay*batch_size)
+    # optimizer = optim.Adam(model.parameters(), lr=0.001) # Adam optimization
+
+    evaluate = False
+    if evaluate:
+        logging('evaluating ...')
+        test(0, 0)
+    else:
+        for epoch in range(init_epoch, max_epochs): 
+            # TRAIN
+            niter = train(epoch)
+            # TEST and SAVE
+            if (epoch % 10 == 0) and (epoch is not 0): 
+                test(epoch, niter)
+                logging('save training stats to %s/costs.npz' % (backupdir))
+                np.savez(os.path.join(backupdir, "costs.npz"),
+                    training_iters=training_iters,
+                    training_losses=training_losses,
+                    testing_iters=testing_iters,
+                    testing_accuracies=testing_accuracies,
+                    testing_errors_pixel=testing_errors_pixel,
+                    testing_errors_angle=testing_errors_angle) 
+                if (testing_accuracies[-1] > best_acc ):
+                    best_acc = testing_accuracies[-1]
+                    logging('best model so far!')
+                    logging('save weights to %s/model.weights' % (backupdir))
+                    model.save_weights('%s/model.weights' % (backupdir))
+        shutil.copy2('%s/model.weights' % (backupdir), '%s/model_backup.weights' % (backupdir))
--- a/py2/utils.py
+++ b/py2/utils.py
--- a/py2/valid.ipynb
+++ b/py2/valid.ipynb
@ -0,0 +1,357 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "import os\n",
+    "import time\n",
+    "import torch\n",
+    "from torch.autograd import Variable\n",
+    "from torchvision import datasets, transforms\n",
+    "import scipy.io\n",
+    "import warnings\n",
+    "warnings.filterwarnings(\"ignore\")\n",
+    "import matplotlib.pyplot as plt\n",
+    "import scipy.misc\n",
+    "\n",
+    "from darknet import Darknet\n",
+    "import dataset\n",
+    "from utils import *\n",
+    "from MeshPly import MeshPly\n",
+    "\n",
+    "# Create new directory\n",
+    "def makedirs(path):\n",
+    "    if not os.path.exists( path ):\n",
+    "        os.makedirs( path )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "IOError",
+     "evalue": "[Errno 2] No such file or directory: 'LINEMOD/ape/ape.ply'",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mIOError\u001b[0m                                   Traceback (most recent call last)",
+      "\u001b[0;32m<ipython-input-7-cd23ddeac3d5>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m    272\u001b[0m \u001b[0mcfgfile\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'cfg/yolo-pose.cfg'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    273\u001b[0m \u001b[0mweightfile\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'backup/ape/model_backup.weights'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 274\u001b[0;31m \u001b[0mvalid\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdatacfg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcfgfile\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mweightfile\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+      "\u001b[0;32m<ipython-input-7-cd23ddeac3d5>\u001b[0m in \u001b[0;36mvalid\u001b[0;34m(datacfg, cfgfile, weightfile)\u001b[0m\n\u001b[1;32m     61\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     62\u001b[0m     \u001b[0;31m# Read object model information, get 3D bounding box corners\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 63\u001b[0;31m     \u001b[0mmesh\u001b[0m          \u001b[0;34m=\u001b[0m \u001b[0mMeshPly\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmeshname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     64\u001b[0m     \u001b[0mvertices\u001b[0m      \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mc_\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmesh\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvertices\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mones\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmesh\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvertices\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtranspose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     65\u001b[0m     \u001b[0mcorners3D\u001b[0m     \u001b[0;34m=\u001b[0m \u001b[0mget_3D_corners\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvertices\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m/cvlabdata1/home/btekin/ope/singleshotpose_release/MeshPly.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, filename, color)\u001b[0m\n\u001b[1;32m      4\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0m__init__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfilename\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcolor\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0.\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0.\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0.\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m         \u001b[0mf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilename\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'r'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      7\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvertices\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      8\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolors\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;31mIOError\u001b[0m: [Errno 2] No such file or directory: 'LINEMOD/ape/ape.ply'"
+     ]
+    }
+   ],
+   "source": [
+    "def valid(datacfg, cfgfile, weightfile):\n",
+    "    def truths_length(truths):\n",
+    "        for i in range(50):\n",
+    "            if truths[i][1] == 0:\n",
+    "                return i\n",
+    "\n",
+    "    # Parse configuration files\n",
+    "    options      = read_data_cfg(datacfg)\n",
+    "    valid_images = options['valid']\n",
+    "    meshname     = options['mesh']\n",
+    "    backupdir    = options['backup']\n",
+    "    name         = options['name']\n",
+    "    if not os.path.exists(backupdir):\n",
+    "        makedirs(backupdir)\n",
+    "\n",
+    "    # Parameters\n",
+    "    prefix       = 'results'\n",
+    "    seed         = int(time.time())\n",
+    "    gpus         = '0'     # Specify which gpus to use\n",
+    "    test_width   = 544\n",
+    "    test_height  = 544\n",
+    "    torch.manual_seed(seed)\n",
+    "    use_cuda = True\n",
+    "    if use_cuda:\n",
+    "        os.environ['CUDA_VISIBLE_DEVICES'] = gpus\n",
+    "        torch.cuda.manual_seed(seed)\n",
+    "    save            = False\n",
+    "    visualize       = True\n",
+    "    testtime        = True\n",
+    "    use_cuda        = True\n",
+    "    num_classes     = 1\n",
+    "    testing_samples = 0.0\n",
+    "    eps             = 1e-5\n",
+    "    notpredicted    = 0 \n",
+    "    conf_thresh     = 0.1\n",
+    "    nms_thresh      = 0.4\n",
+    "    match_thresh    = 0.5\n",
+    "    edges_corners = [[0, 1], [0, 2], [0, 4], [1, 3], [1, 5], [2, 3], [2, 6], [3, 7], [4, 5], [4, 6], [5, 7], [6, 7]]\n",
+    "\n",
+    "    if save:\n",
+    "        makedirs(backupdir + '/test')\n",
+    "        makedirs(backupdir + '/test/gt')\n",
+    "        makedirs(backupdir + '/test/pr')\n",
+    "\n",
+    "    # To save\n",
+    "    testing_error_trans = 0.0\n",
+    "    testing_error_angle = 0.0\n",
+    "    testing_error_pixel = 0.0\n",
+    "    errs_2d             = []\n",
+    "    errs_3d             = []\n",
+    "    errs_trans          = []\n",
+    "    errs_angle          = []\n",
+    "    errs_corner2D       = []\n",
+    "    preds_trans         = []\n",
+    "    preds_rot           = []\n",
+    "    preds_corners2D     = []\n",
+    "    gts_trans           = []\n",
+    "    gts_rot             = []\n",
+    "    gts_corners2D       = []\n",
+    "    ious                = []\n",
+    "\n",
+    "    # Read object model information, get 3D bounding box corners\n",
+    "    mesh          = MeshPly(meshname)\n",
+    "    vertices      = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()\n",
+    "    corners3D     = get_3D_corners(vertices)\n",
+    "    # diam          = calc_pts_diameter(np.array(mesh.vertices))\n",
+    "    diam          = float(options['diam'])\n",
+    "\n",
+    "    # Read intrinsic camera parameters\n",
+    "    internal_calibration = get_camera_intrinsic()\n",
+    "\n",
+    "    # Get validation file names\n",
+    "    with open(valid_images) as fp:\n",
+    "        tmp_files = fp.readlines()\n",
+    "        valid_files = [item.rstrip() for item in tmp_files]\n",
+    "    \n",
+    "    # Specicy model, load pretrained weights, pass to GPU and set the module in evaluation mode\n",
+    "    model = Darknet(cfgfile)\n",
+    "    model.print_network()\n",
+    "    model.load_weights(weightfile)\n",
+    "    model.cuda()\n",
+    "    model.eval()\n",
+    "\n",
+    "    # Get the parser for the test dataset\n",
+    "    valid_dataset = dataset.listDataset(valid_images, shape=(test_width, test_height),\n",
+    "                       shuffle=False,\n",
+    "                       transform=transforms.Compose([\n",
+    "                           transforms.ToTensor(),]))\n",
+    "    valid_batchsize = 1\n",
+    "\n",
+    "    # Specify the number of workers for multiple processing, get the dataloader for the test dataset\n",
+    "    kwargs = {'num_workers': 4, 'pin_memory': True}\n",
+    "    test_loader = torch.utils.data.DataLoader(\n",
+    "        valid_dataset, batch_size=valid_batchsize, shuffle=False, **kwargs) \n",
+    "\n",
+    "    logging(\"   Testing {}...\".format(name))\n",
+    "    logging(\"   Number of test samples: %d\" % len(test_loader.dataset))\n",
+    "    # Iterate through test batches (Batch size for test data is 1)\n",
+    "    count = 0\n",
+    "    z = np.zeros((3, 1))\n",
+    "    for batch_idx, (data, target) in enumerate(test_loader):\n",
+    "        \n",
+    "        # Images\n",
+    "        img = data[0, :, :, :]\n",
+    "        img = img.numpy().squeeze()\n",
+    "        img = np.transpose(img, (1, 2, 0))\n",
+    "        \n",
+    "        t1 = time.time()\n",
+    "        # Pass data to GPU\n",
+    "        if use_cuda:\n",
+    "            data = data.cuda()\n",
+    "            target = target.cuda()\n",
+    "        \n",
+    "        # Wrap tensors in Variable class, set volatile=True for inference mode and to use minimal memory during inference\n",
+    "        data = Variable(data, volatile=True)\n",
+    "        t2 = time.time()\n",
+    "        \n",
+    "        # Forward pass\n",
+    "        output = model(data).data  \n",
+    "        t3 = time.time()\n",
+    "        \n",
+    "        # Using confidence threshold, eliminate low-confidence predictions\n",
+    "        all_boxes = get_region_boxes(output, conf_thresh, num_classes)        \n",
+    "        t4 = time.time()\n",
+    "\n",
+    "        # Iterate through all images in the batch\n",
+    "        for i in range(output.size(0)):\n",
+    "        \n",
+    "            # For each image, get all the predictions\n",
+    "            boxes   = all_boxes[i]\n",
+    "        \n",
+    "            # For each image, get all the targets (for multiple object pose estimation, there might be more than 1 target per image)\n",
+    "            truths  = target[i].view(-1, 21)\n",
+    "        \n",
+    "            # Get how many object are present in the scene\n",
+    "            num_gts = truths_length(truths)\n",
+    "\n",
+    "             # Iterate through each ground-truth object\n",
+    "            for k in range(num_gts):\n",
+    "                box_gt        = [truths[k][1], truths[k][2], truths[k][3], truths[k][4], truths[k][5], truths[k][6], \n",
+    "                                truths[k][7], truths[k][8], truths[k][9], truths[k][10], truths[k][11], truths[k][12], \n",
+    "                                truths[k][13], truths[k][14], truths[k][15], truths[k][16], truths[k][17], truths[k][18], 1.0, 1.0, truths[k][0]]\n",
+    "                best_conf_est = -1\n",
+    "\n",
+    "                # If the prediction has the highest confidence, choose it as our prediction for single object pose estimation\n",
+    "                for j in range(len(boxes)):\n",
+    "                    if (boxes[j][18] > best_conf_est):\n",
+    "                        match         = corner_confidence9(box_gt[:18], torch.FloatTensor(boxes[j][:18]))\n",
+    "                        box_pr        = boxes[j]\n",
+    "                        best_conf_est = boxes[j][18]\n",
+    "\n",
+    "                # Denormalize the corner predictions \n",
+    "                corners2D_gt = np.array(np.reshape(box_gt[:18], [9, 2]), dtype='float32')\n",
+    "                corners2D_pr = np.array(np.reshape(box_pr[:18], [9, 2]), dtype='float32')\n",
+    "                corners2D_gt[:, 0] = corners2D_gt[:, 0] * 640\n",
+    "                corners2D_gt[:, 1] = corners2D_gt[:, 1] * 480               \n",
+    "                corners2D_pr[:, 0] = corners2D_pr[:, 0] * 640\n",
+    "                corners2D_pr[:, 1] = corners2D_pr[:, 1] * 480\n",
+    "                preds_corners2D.append(corners2D_pr)\n",
+    "                gts_corners2D.append(corners2D_gt)\n",
+    "\n",
+    "                # Compute corner prediction error\n",
+    "                corner_norm = np.linalg.norm(corners2D_gt - corners2D_pr, axis=1)\n",
+    "                corner_dist = np.mean(corner_norm)\n",
+    "                errs_corner2D.append(corner_dist)\n",
+    "                \n",
+    "                # Compute [R|t] by pnp\n",
+    "                R_gt, t_gt = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_gt, np.array(internal_calibration, dtype='float32'))\n",
+    "                R_pr, t_pr = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_pr, np.array(internal_calibration, dtype='float32'))\n",
+    "\n",
+    "                if save:\n",
+    "                    preds_trans.append(t_pr)\n",
+    "                    gts_trans.append(t_gt)\n",
+    "                    preds_rot.append(R_pr)\n",
+    "                    gts_rot.append(R_gt)\n",
+    "\n",
+    "                    np.savetxt(backupdir + '/test/gt/R_' + valid_files[count][-8:-3] + 'txt', np.array(R_gt, dtype='float32'))\n",
+    "                    np.savetxt(backupdir + '/test/gt/t_' + valid_files[count][-8:-3] + 'txt', np.array(R_pr, dtype='float32'))\n",
+    "                    np.savetxt(backupdir + '/test/pr/R_' + valid_files[count][-8:-3] + 'txt', np.array(t_gt, dtype='float32'))\n",
+    "                    np.savetxt(backupdir + '/test/pr/t_' + valid_files[count][-8:-3] + 'txt', np.array(t_pr, dtype='float32'))\n",
+    "                    np.savetxt(backupdir + '/test/gt/corners_' + valid_files[count][-8:-3] + 'txt', np.array(corners2D_gt, dtype='float32'))\n",
+    "                    np.savetxt(backupdir + '/test/pr/corners_' + valid_files[count][-8:-3] + 'txt', np.array(corners2D_pr, dtype='float32'))\n",
+    "                \n",
+    "                # Compute translation error\n",
+    "                trans_dist   = np.sqrt(np.sum(np.square(t_gt - t_pr)))\n",
+    "                errs_trans.append(trans_dist)\n",
+    "                \n",
+    "                # Compute angle error\n",
+    "                angle_dist   = calcAngularDistance(R_gt, R_pr)\n",
+    "                errs_angle.append(angle_dist)\n",
+    "                \n",
+    "                # Compute pixel error\n",
+    "                Rt_gt        = np.concatenate((R_gt, t_gt), axis=1)\n",
+    "                Rt_pr        = np.concatenate((R_pr, t_pr), axis=1)\n",
+    "                proj_2d_gt   = compute_projection(vertices, Rt_gt, internal_calibration)\n",
+    "                proj_2d_pred = compute_projection(vertices, Rt_pr, internal_calibration)  \n",
+    "                proj_corners_gt = np.transpose(compute_projection(corners3D, Rt_gt, internal_calibration)) \n",
+    "                proj_corners_pr = np.transpose(compute_projection(corners3D, Rt_pr, internal_calibration)) \n",
+    "                norm         = np.linalg.norm(proj_2d_gt - proj_2d_pred, axis=0)\n",
+    "                pixel_dist   = np.mean(norm)\n",
+    "                errs_2d.append(pixel_dist)\n",
+    "\n",
+    "                if visualize:\n",
+    "                    # Visualize\n",
+    "                    plt.xlim((0, 640))\n",
+    "                    plt.ylim((0, 480))\n",
+    "                    plt.imshow(scipy.misc.imresize(img, (480, 640)))\n",
+    "                    # Projections\n",
+    "                    for edge in edges_corners:\n",
+    "                        plt.plot(proj_corners_gt[edge, 0], proj_corners_gt[edge, 1], color='g', linewidth=3.0)\n",
+    "                        plt.plot(proj_corners_pr[edge, 0], proj_corners_pr[edge, 1], color='b', linewidth=3.0)\n",
+    "                    plt.gca().invert_yaxis()\n",
+    "                    plt.show()\n",
+    "                \n",
+    "                # Compute IoU score\n",
+    "                bb_gt        = compute_2d_bb_from_orig_pix(proj_2d_gt, output.size(3))\n",
+    "                bb_pred      = compute_2d_bb_from_orig_pix(proj_2d_pred, output.size(3))\n",
+    "                iou          = bbox_iou(bb_gt, bb_pred)\n",
+    "                ious.append(iou)\n",
+    "\n",
+    "                # Compute 3D distances\n",
+    "                transform_3d_gt   = compute_transformation(vertices, Rt_gt) \n",
+    "                transform_3d_pred = compute_transformation(vertices, Rt_pr)  \n",
+    "                norm3d            = np.linalg.norm(transform_3d_gt - transform_3d_pred, axis=0)\n",
+    "                vertex_dist       = np.mean(norm3d)    \n",
+    "                errs_3d.append(vertex_dist)  \n",
+    "\n",
+    "                # Sum errors\n",
+    "                testing_error_trans  += trans_dist\n",
+    "                testing_error_angle  += angle_dist\n",
+    "                testing_error_pixel  += pixel_dist\n",
+    "                testing_samples      += 1\n",
+    "                count = count + 1\n",
+    "\n",
+    "        t5 = time.time()\n",
+    "\n",
+    "    # Compute 2D projection error, 6D pose error, 5cm5degree error\n",
+    "    px_threshold = 5\n",
+    "    acc         = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)\n",
+    "    acciou      = len(np.where(np.array(errs_2d) >= 0.5)[0]) * 100. / (len(ious)+eps)\n",
+    "    acc5cm5deg  = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)\n",
+    "    acc3d10     = len(np.where(np.array(errs_3d) <= diam * 0.1)[0]) * 100. / (len(errs_3d)+eps)\n",
+    "    acc5cm5deg  = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)\n",
+    "    corner_acc  = len(np.where(np.array(errs_corner2D) <= px_threshold)[0]) * 100. / (len(errs_corner2D)+eps)\n",
+    "    mean_err_2d = np.mean(errs_2d)\n",
+    "    mean_corner_err_2d = np.mean(errs_corner2D)\n",
+    "    nts = float(testing_samples)\n",
+    "\n",
+    "    if testtime:\n",
+    "        print('-----------------------------------')\n",
+    "        print('  tensor to cuda : %f' % (t2 - t1))\n",
+    "        print('         predict : %f' % (t3 - t2))\n",
+    "        print('get_region_boxes : %f' % (t4 - t3))\n",
+    "        print('             nms : %f' % (t5 - t4))\n",
+    "        print('           total : %f' % (t5 - t1))\n",
+    "        print('-----------------------------------')\n",
+    "\n",
+    "    # Print test statistics\n",
+    "    logging('Results of {}'.format(name))\n",
+    "    logging('   Acc using {} px 2D Projection = {:.2f}%'.format(px_threshold, acc))\n",
+    "    logging('   Acc using the IoU metric = {:.6f}%'.format(acciou))\n",
+    "    logging('   Acc using 10% threshold - {} vx 3D Transformation = {:.2f}%'.format(diam * 0.1, acc3d10))\n",
+    "    logging('   Acc using 5 cm 5 degree metric = {:.2f}%'.format(acc5cm5deg))\n",
+    "    logging(\"   Mean 2D pixel error is %f, Mean vertex error is %f, mean corner error is %f\" % (mean_err_2d, np.mean(errs_3d), mean_corner_err_2d))\n",
+    "    logging('   Translation error: %f m, angle error: %f degree, pixel error: % f pix' % (testing_error_trans/nts, testing_error_angle/nts, testing_error_pixel/nts) )\n",
+    "\n",
+    "    if save:\n",
+    "        predfile = backupdir + '/predictions_linemod_' + name +  '.mat'\n",
+    "        scipy.io.savemat(predfile, {'R_gts': gts_rot, 't_gts':gts_trans, 'corner_gts': gts_corners2D, 'R_prs': preds_rot, 't_prs':preds_trans, 'corner_prs': preds_corners2D})\n",
+    "\n",
+    "datacfg = 'cfg/ape.data'\n",
+    "cfgfile = 'cfg/yolo-pose.cfg'\n",
+    "weightfile = 'backup/ape/model_backup.weights'\n",
+    "valid(datacfg, cfgfile, weightfile)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/py2/valid.py
+++ b/py2/valid.py
@ -0,0 +1,269 @@
+import os
+import time
+import torch
+from torch.autograd import Variable
+from torchvision import datasets, transforms
+import scipy.io
+import warnings
+warnings.filterwarnings("ignore")
+
+from darknet import Darknet
+import dataset
+from utils import *
+from MeshPly import MeshPly
+
+# Create new directory
+def makedirs(path):
+    if not os.path.exists( path ):
+        os.makedirs( path )
+
+def valid(datacfg, cfgfile, weightfile, outfile):
+    def truths_length(truths):
+        for i in range(50):
+            if truths[i][1] == 0:
+                return i
+
+    # Parse configuration files
+    options      = read_data_cfg(datacfg)
+    valid_images = options['valid']
+    meshname     = options['mesh']
+    backupdir    = options['backup']
+    name         = options['name']
+    if not os.path.exists(backupdir):
+        makedirs(backupdir)
+
+    # Parameters
+    prefix       = 'results'
+    seed         = int(time.time())
+    gpus         = '0'     # Specify which gpus to use
+    test_width   = 544
+    test_height  = 544
+    torch.manual_seed(seed)
+    use_cuda = True
+    if use_cuda:
+        os.environ['CUDA_VISIBLE_DEVICES'] = gpus
+        torch.cuda.manual_seed(seed)
+    save            = False
+    testtime        = True
+    use_cuda        = True
+    num_classes     = 1
+    testing_samples = 0.0
+    eps             = 1e-5
+    notpredicted    = 0 
+    conf_thresh     = 0.1
+    nms_thresh      = 0.4
+    match_thresh    = 0.5
+    if save:
+        makedirs(backupdir + '/test')
+        makedirs(backupdir + '/test/gt')
+        makedirs(backupdir + '/test/pr')
+
+    # To save
+    testing_error_trans = 0.0
+    testing_error_angle = 0.0
+    testing_error_pixel = 0.0
+    errs_2d             = []
+    errs_3d             = []
+    errs_trans          = []
+    errs_angle          = []
+    errs_corner2D       = []
+    preds_trans         = []
+    preds_rot           = []
+    preds_corners2D     = []
+    gts_trans           = []
+    gts_rot             = []
+    gts_corners2D       = []
+
+    # Read object model information, get 3D bounding box corners
+    mesh          = MeshPly(meshname)
+    vertices      = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()
+    corners3D     = get_3D_corners(vertices)
+    # diam          = calc_pts_diameter(np.array(mesh.vertices))
+    diam          = float(options['diam'])
+
+    # Read intrinsic camera parameters
+    internal_calibration = get_camera_intrinsic()
+
+    # Get validation file names
+    with open(valid_images) as fp:
+        tmp_files = fp.readlines()
+        valid_files = [item.rstrip() for item in tmp_files]
+    
+    # Specicy model, load pretrained weights, pass to GPU and set the module in evaluation mode
+    model = Darknet(cfgfile)
+    model.print_network()
+    model.load_weights(weightfile)
+    model.cuda()
+    model.eval()
+
+    # Get the parser for the test dataset
+    valid_dataset = dataset.listDataset(valid_images, shape=(test_width, test_height),
+                       shuffle=False,
+                       transform=transforms.Compose([
+                           transforms.ToTensor(),]))
+    valid_batchsize = 1
+
+    # Specify the number of workers for multiple processing, get the dataloader for the test dataset
+    kwargs = {'num_workers': 4, 'pin_memory': True}
+    test_loader = torch.utils.data.DataLoader(
+        valid_dataset, batch_size=valid_batchsize, shuffle=False, **kwargs) 
+
+    logging("   Testing {}...".format(name))
+    logging("   Number of test samples: %d" % len(test_loader.dataset))
+    # Iterate through test batches (Batch size for test data is 1)
+    count = 0
+    z = np.zeros((3, 1))
+    for batch_idx, (data, target) in enumerate(test_loader):
+        
+        t1 = time.time()
+        # Pass data to GPU
+        if use_cuda:
+            data = data.cuda()
+            target = target.cuda()
+        
+        # Wrap tensors in Variable class, set volatile=True for inference mode and to use minimal memory during inference
+        data = Variable(data, volatile=True)
+        t2 = time.time()
+        
+        # Forward pass
+        output = model(data).data  
+        t3 = time.time()
+        
+        # Using confidence threshold, eliminate low-confidence predictions
+        all_boxes = get_region_boxes(output, conf_thresh, num_classes)        
+        t4 = time.time()
+
+        # Iterate through all images in the batch
+        for i in range(output.size(0)):
+        
+            # For each image, get all the predictions
+            boxes   = all_boxes[i]
+        
+            # For each image, get all the targets (for multiple object pose estimation, there might be more than 1 target per image)
+            truths  = target[i].view(-1, 21)
+        
+            # Get how many object are present in the scene
+            num_gts = truths_length(truths)
+
+             # Iterate through each ground-truth object
+            for k in range(num_gts):
+                box_gt        = [truths[k][1], truths[k][2], truths[k][3], truths[k][4], truths[k][5], truths[k][6], 
+                                truths[k][7], truths[k][8], truths[k][9], truths[k][10], truths[k][11], truths[k][12], 
+                                truths[k][13], truths[k][14], truths[k][15], truths[k][16], truths[k][17], truths[k][18], 1.0, 1.0, truths[k][0]]
+                best_conf_est = -1
+
+                # If the prediction has the highest confidence, choose it as our prediction for single object pose estimation
+                for j in range(len(boxes)):
+                    if (boxes[j][18] > best_conf_est):
+                        match         = corner_confidence9(box_gt[:18], torch.FloatTensor(boxes[j][:18]))
+                        box_pr        = boxes[j]
+                        best_conf_est = boxes[j][18]
+
+                # Denormalize the corner predictions 
+                corners2D_gt = np.array(np.reshape(box_gt[:18], [9, 2]), dtype='float32')
+                corners2D_pr = np.array(np.reshape(box_pr[:18], [9, 2]), dtype='float32')
+                corners2D_gt[:, 0] = corners2D_gt[:, 0] * 640
+                corners2D_gt[:, 1] = corners2D_gt[:, 1] * 480               
+                corners2D_pr[:, 0] = corners2D_pr[:, 0] * 640
+                corners2D_pr[:, 1] = corners2D_pr[:, 1] * 480
+                preds_corners2D.append(corners2D_pr)
+                gts_corners2D.append(corners2D_gt)
+
+                # Compute corner prediction error
+                corner_norm = np.linalg.norm(corners2D_gt - corners2D_pr, axis=1)
+                corner_dist = np.mean(corner_norm)
+                errs_corner2D.append(corner_dist)
+                
+                # Compute [R|t] by pnp
+                R_gt, t_gt = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_gt, np.array(internal_calibration, dtype='float32'))
+                R_pr, t_pr = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_pr, np.array(internal_calibration, dtype='float32'))
+
+                if save:
+                    preds_trans.append(t_pr)
+                    gts_trans.append(t_gt)
+                    preds_rot.append(R_pr)
+                    gts_rot.append(R_gt)
+
+                    np.savetxt(backupdir + '/test/gt/R_' + valid_files[count][-8:-3] + 'txt', np.array(R_gt, dtype='float32'))
+                    np.savetxt(backupdir + '/test/gt/t_' + valid_files[count][-8:-3] + 'txt', np.array(t_gt, dtype='float32'))
+                    np.savetxt(backupdir + '/test/pr/R_' + valid_files[count][-8:-3] + 'txt', np.array(R_pr, dtype='float32'))
+                    np.savetxt(backupdir + '/test/pr/t_' + valid_files[count][-8:-3] + 'txt', np.array(t_pr, dtype='float32'))
+                    np.savetxt(backupdir + '/test/gt/corners_' + valid_files[count][-8:-3] + 'txt', np.array(corners2D_gt, dtype='float32'))
+                    np.savetxt(backupdir + '/test/pr/corners_' + valid_files[count][-8:-3] + 'txt', np.array(corners2D_pr, dtype='float32'))
+                
+                # Compute translation error
+                trans_dist   = np.sqrt(np.sum(np.square(t_gt - t_pr)))
+                errs_trans.append(trans_dist)
+                
+                # Compute angle error
+                angle_dist   = calcAngularDistance(R_gt, R_pr)
+                errs_angle.append(angle_dist)
+                
+                # Compute pixel error
+                Rt_gt        = np.concatenate((R_gt, t_gt), axis=1)
+                Rt_pr        = np.concatenate((R_pr, t_pr), axis=1)
+                proj_2d_gt   = compute_projection(vertices, Rt_gt, internal_calibration)
+                proj_2d_pred = compute_projection(vertices, Rt_pr, internal_calibration) 
+                norm         = np.linalg.norm(proj_2d_gt - proj_2d_pred, axis=0)
+                pixel_dist   = np.mean(norm)
+                errs_2d.append(pixel_dist)
+
+                # Compute 3D distances
+                transform_3d_gt   = compute_transformation(vertices, Rt_gt) 
+                transform_3d_pred = compute_transformation(vertices, Rt_pr)  
+                norm3d            = np.linalg.norm(transform_3d_gt - transform_3d_pred, axis=0)
+                vertex_dist       = np.mean(norm3d)    
+                errs_3d.append(vertex_dist)  
+
+                # Sum errors
+                testing_error_trans  += trans_dist
+                testing_error_angle  += angle_dist
+                testing_error_pixel  += pixel_dist
+                testing_samples      += 1
+                count = count + 1
+
+        t5 = time.time()
+
+    # Compute 2D projection error, 6D pose error, 5cm5degree error
+    px_threshold = 5
+    acc         = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)
+    acc5cm5deg  = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)
+    acc3d10     = len(np.where(np.array(errs_3d) <= diam * 0.1)[0]) * 100. / (len(errs_3d)+eps)
+    acc5cm5deg  = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)
+    corner_acc  = len(np.where(np.array(errs_corner2D) <= px_threshold)[0]) * 100. / (len(errs_corner2D)+eps)
+    mean_err_2d = np.mean(errs_2d)
+    mean_corner_err_2d = np.mean(errs_corner2D)
+    nts = float(testing_samples)
+
+    if testtime:
+        print('-----------------------------------')
+        print('  tensor to cuda : %f' % (t2 - t1))
+        print('         predict : %f' % (t3 - t2))
+        print('get_region_boxes : %f' % (t4 - t3))
+        print('            eval : %f' % (t5 - t4))
+        print('           total : %f' % (t5 - t1))
+        print('-----------------------------------')
+
+    # Print test statistics
+    logging('Results of {}'.format(name))
+    logging('   Acc using {} px 2D Projection = {:.2f}%'.format(px_threshold, acc))
+    logging('   Acc using 10% threshold - {} vx 3D Transformation = {:.2f}%'.format(diam * 0.1, acc3d10))
+    logging('   Acc using 5 cm 5 degree metric = {:.2f}%'.format(acc5cm5deg))
+    logging("   Mean 2D pixel error is %f, Mean vertex error is %f, mean corner error is %f" % (mean_err_2d, np.mean(errs_3d), mean_corner_err_2d))
+    logging('   Translation error: %f m, angle error: %f degree, pixel error: % f pix' % (testing_error_trans/nts, testing_error_angle/nts, testing_error_pixel/nts) )
+
+    if save:
+        predfile = backupdir + '/predictions_linemod_' + name +  '.mat'
+        scipy.io.savemat(predfile, {'R_gts': gts_rot, 't_gts':gts_trans, 'corner_gts': gts_corners2D, 'R_prs': preds_rot, 't_prs':preds_trans, 'corner_prs': preds_corners2D})
+
+if __name__ == '__main__':
+    import sys
+    if len(sys.argv) == 4:
+        datacfg = sys.argv[1]
+        cfgfile = sys.argv[2]
+        weightfile = sys.argv[3]
+        outfile = 'comp4_det_test_'
+        valid(datacfg, cfgfile, weightfile, outfile)
+    else:
+        print('Usage:')
+        print(' python valid.py datacfg cfgfile weightfile')
--- a/region_loss.py
+++ b/region_loss.py
@ -6,165 +6,93 @@ import torch.nn.functional as F
 from torch.autograd import Variable
 from utils import *

-def build_targets(pred_corners, target, anchors, num_anchors, num_classes, nH, nW, noobject_scale, object_scale, sil_thresh, seen):
+def build_targets(pred_corners, target, num_keypoints, num_anchors, num_classes, nH, nW, noobject_scale, object_scale, sil_thresh, seen):
    nB = target.size(0)
    nA = num_anchors
    nC = num_classes
-    anchor_step = len(anchors)/num_anchors
    conf_mask   = torch.ones(nB, nA, nH, nW) * noobject_scale
    coord_mask  = torch.zeros(nB, nA, nH, nW)
    cls_mask    = torch.zeros(nB, nA, nH, nW)
-    tx0         = torch.zeros(nB, nA, nH, nW) 
-    ty0         = torch.zeros(nB, nA, nH, nW) 
-    tx1         = torch.zeros(nB, nA, nH, nW) 
-    ty1         = torch.zeros(nB, nA, nH, nW) 
-    tx2         = torch.zeros(nB, nA, nH, nW) 
-    ty2         = torch.zeros(nB, nA, nH, nW) 
-    tx3         = torch.zeros(nB, nA, nH, nW) 
-    ty3         = torch.zeros(nB, nA, nH, nW) 
-    tx4         = torch.zeros(nB, nA, nH, nW) 
-    ty4         = torch.zeros(nB, nA, nH, nW) 
-    tx5         = torch.zeros(nB, nA, nH, nW) 
-    ty5         = torch.zeros(nB, nA, nH, nW) 
-    tx6         = torch.zeros(nB, nA, nH, nW) 
-    ty6         = torch.zeros(nB, nA, nH, nW) 
-    tx7         = torch.zeros(nB, nA, nH, nW) 
-    ty7         = torch.zeros(nB, nA, nH, nW) 
-    tx8         = torch.zeros(nB, nA, nH, nW) 
-    ty8         = torch.zeros(nB, nA, nH, nW) 
-    tconf       = torch.zeros(nB, nA, nH, nW)
-    tcls        = torch.zeros(nB, nA, nH, nW) 
+    txs = list()
+    tys = list()
+    for i in range(num_keypoints):
+        txs.append(torch.zeros(nB, nA, nH, nW))
+        tys.append(torch.zeros(nB, nA, nH, nW)) 
+    tconf = torch.zeros(nB, nA, nH, nW)
+    tcls  = torch.zeros(nB, nA, nH, nW) 

+    num_labels = 2 * num_keypoints + 3 # +2 for width, height and +1 for class within label files
    nAnchors = nA*nH*nW
    nPixels  = nH*nW
    for b in range(nB):
        cur_pred_corners = pred_corners[b*nAnchors:(b+1)*nAnchors].t()
        cur_confs = torch.zeros(nAnchors)
        for t in range(50):
-            if target[b][t*21+1] == 0:
+            if target[b][t*num_labels+1] == 0:
                break
-            gx0 = target[b][t*21+1]*nW
-            gy0 = target[b][t*21+2]*nH
-            gx1 = target[b][t*21+3]*nW
-            gy1 = target[b][t*21+4]*nH
-            gx2 = target[b][t*21+5]*nW
-            gy2 = target[b][t*21+6]*nH
-            gx3 = target[b][t*21+7]*nW
-            gy3 = target[b][t*21+8]*nH
-            gx4 = target[b][t*21+9]*nW
-            gy4 = target[b][t*21+10]*nH
-            gx5 = target[b][t*21+11]*nW
-            gy5 = target[b][t*21+12]*nH
-            gx6 = target[b][t*21+13]*nW
-            gy6 = target[b][t*21+14]*nH
-            gx7 = target[b][t*21+15]*nW
-            gy7 = target[b][t*21+16]*nH
-            gx8 = target[b][t*21+17]*nW
-            gy8 = target[b][t*21+18]*nH
+            g = list()
+            for i in range(num_keypoints):
+                g.append(target[b][t*num_labels+2*i+1])
+                g.append(target[b][t*num_labels+2*i+2])

-            cur_gt_corners = torch.FloatTensor([gx0/nW,gy0/nH,gx1/nW,gy1/nH,gx2/nW,gy2/nH,gx3/nW,gy3/nH,gx4/nW,gy4/nH,gx5/nW,gy5/nH,gx6/nW,gy6/nH,gx7/nW,gy7/nH,gx8/nW,gy8/nH]).repeat(nAnchors,1).t() # 16 x nAnchors
-            cur_confs  = torch.max(cur_confs, corner_confidences9(cur_pred_corners, cur_gt_corners)) # some irrelevant areas are filtered, in the same grid multiple anchor boxes might exceed the threshold
+            cur_gt_corners = torch.FloatTensor(g).repeat(nAnchors,1).t() # 16 x nAnchors
+            cur_confs  = torch.max(cur_confs, corner_confidences(cur_pred_corners, cur_gt_corners)).view_as(conf_mask[b]) # some irrelevant areas are filtered, in the same grid multiple anchor boxes might exceed the threshold
        conf_mask[b][cur_confs>sil_thresh] = 0
-    if seen < -1:#6400:
-       tx0.fill_(0.5)
-       ty0.fill_(0.5)
-       tx1.fill_(0.5)
-       ty1.fill_(0.5)
-       tx2.fill_(0.5)
-       ty2.fill_(0.5)
-       tx3.fill_(0.5)
-       ty3.fill_(0.5)
-       tx4.fill_(0.5)
-       ty4.fill_(0.5)
-       tx5.fill_(0.5)
-       ty5.fill_(0.5)
-       tx6.fill_(0.5)
-       ty6.fill_(0.5)
-       tx7.fill_(0.5)
-       ty7.fill_(0.5)
-       tx8.fill_(0.5)
-       ty8.fill_(0.5)
-       coord_mask.fill_(1)
+

    nGT = 0
    nCorrect = 0
    for b in range(nB):
        for t in range(50):
-            if target[b][t*21+1] == 0:
+            if target[b][t*num_labels+1] == 0:
                break
+            # Get gt box for the current label
            nGT = nGT + 1
-            best_iou = 0.0
-            best_n = -1
-            min_dist = 10000
-            gx0 = target[b][t*21+1] * nW
-            gy0 = target[b][t*21+2] * nH
-            gi0 = int(gx0)
-            gj0 = int(gy0)
-            gx1 = target[b][t*21+3] * nW
-            gy1 = target[b][t*21+4] * nH
-            gx2 = target[b][t*21+5] * nW
-            gy2 = target[b][t*21+6] * nH
-            gx3 = target[b][t*21+7] * nW
-            gy3 = target[b][t*21+8] * nH
-            gx4 = target[b][t*21+9] * nW
-            gy4 = target[b][t*21+10] * nH
-            gx5 = target[b][t*21+11] * nW
-            gy5 = target[b][t*21+12] * nH
-            gx6 = target[b][t*21+13] * nW
-            gy6 = target[b][t*21+14] * nH
-            gx7 = target[b][t*21+15] * nW
-            gy7 = target[b][t*21+16] * nH
-            gx8 = target[b][t*21+17] * nW
-            gy8 = target[b][t*21+18] * nH
-
+            gx = list()
+            gy = list()
+            gt_box = list()
+            for i in range(num_keypoints):
+                gt_box.extend([target[b][t*num_labels+2*i+1], target[b][t*num_labels+2*i+2]])
+                gx.append(target[b][t*num_labels+2*i+1] * nW)
+                gy.append(target[b][t*num_labels+2*i+2] * nH)
+                if i == 0:
+                    gi0  = int(gx[i])
+                    gj0  = int(gy[i])
+            # Update masks
            best_n = 0 # 1 anchor box
-            gt_box = [gx0/nW,gy0/nH,gx1/nW,gy1/nH,gx2/nW,gy2/nH,gx3/nW,gy3/nH,gx4/nW,gy4/nH,gx5/nW,gy5/nH,gx6/nW,gy6/nH,gx7/nW,gy7/nH,gx8/nW,gy8/nH]
            pred_box = pred_corners[b*nAnchors+best_n*nPixels+gj0*nW+gi0]
-            conf = corner_confidence9(gt_box, pred_box) 
+            conf = corner_confidence(gt_box, pred_box) 
            coord_mask[b][best_n][gj0][gi0] = 1
            cls_mask[b][best_n][gj0][gi0]   = 1
            conf_mask[b][best_n][gj0][gi0]  = object_scale
-            tx0[b][best_n][gj0][gi0]        = target[b][t*21+1] * nW - gi0
-            ty0[b][best_n][gj0][gi0]        = target[b][t*21+2] * nH - gj0
-            tx1[b][best_n][gj0][gi0]        = target[b][t*21+3] * nW - gi0
-            ty1[b][best_n][gj0][gi0]        = target[b][t*21+4] * nH - gj0
-            tx2[b][best_n][gj0][gi0]        = target[b][t*21+5] * nW - gi0
-            ty2[b][best_n][gj0][gi0]        = target[b][t*21+6] * nH - gj0
-            tx3[b][best_n][gj0][gi0]        = target[b][t*21+7] * nW - gi0
-            ty3[b][best_n][gj0][gi0]        = target[b][t*21+8] * nH - gj0
-            tx4[b][best_n][gj0][gi0]        = target[b][t*21+9] * nW - gi0
-            ty4[b][best_n][gj0][gi0]        = target[b][t*21+10] * nH - gj0
-            tx5[b][best_n][gj0][gi0]        = target[b][t*21+11] * nW - gi0
-            ty5[b][best_n][gj0][gi0]        = target[b][t*21+12] * nH - gj0
-            tx6[b][best_n][gj0][gi0]        = target[b][t*21+13] * nW - gi0
-            ty6[b][best_n][gj0][gi0]        = target[b][t*21+14] * nH - gj0
-            tx7[b][best_n][gj0][gi0]        = target[b][t*21+15] * nW - gi0
-            ty7[b][best_n][gj0][gi0]        = target[b][t*21+16] * nH - gj0
-            tx8[b][best_n][gj0][gi0]        = target[b][t*21+17] * nW - gi0
-            ty8[b][best_n][gj0][gi0]        = target[b][t*21+18] * nH - gj0
+            # Update targets
+            for i in range(num_keypoints):
+                txs[i][b][best_n][gj0][gi0] = gx[i]- gi0
+                tys[i][b][best_n][gj0][gi0] = gy[i]- gj0   
            tconf[b][best_n][gj0][gi0]      = conf
-            tcls[b][best_n][gj0][gi0]       = target[b][t*21]
-
-            if conf > 0.5:
+            tcls[b][best_n][gj0][gi0]       = target[b][t*num_labels]
+            # Update recall during training
+            if conf > 0.5: 
                nCorrect = nCorrect + 1

-    return nGT, nCorrect, coord_mask, conf_mask, cls_mask, tx0, tx1, tx2, tx3, tx4, tx5, tx6, tx7, tx8, ty0, ty1, ty2, ty3, ty4, ty5, ty6, ty7, ty8, tconf, tcls
+    return nGT, nCorrect, coord_mask, conf_mask, cls_mask, txs, tys, tconf, tcls
           
 class RegionLoss(nn.Module):
-    def __init__(self, num_classes=0, anchors=[], num_anchors=1):
+    def __init__(self, num_keypoints=9, num_classes=1, anchors=[], num_anchors=1, pretrain_num_epochs=15):
+        # Define the loss layer
        super(RegionLoss, self).__init__()
-        self.num_classes = num_classes
-        self.anchors = anchors
-        self.num_anchors = num_anchors
-        self.anchor_step = len(anchors)/num_anchors
-        self.coord_scale = 1
-        self.noobject_scale = 1
-        self.object_scale = 5
-        self.class_scale = 1
-        self.thresh = 0.6
-        self.seen = 0
+        self.num_classes         = num_classes
+        self.num_anchors         = num_anchors # for single object pose estimation, there is only 1 trivial predictor (anchor)
+        self.num_keypoints       = num_keypoints
+        self.coord_scale         = 1
+        self.noobject_scale      = 1
+        self.object_scale        = 5
+        self.class_scale         = 1
+        self.thresh              = 0.6
+        self.seen                = 0
+        self.pretrain_num_epochs = pretrain_num_epochs

-    def forward(self, output, target):
+    def forward(self, output, target, epoch):
        # Parameters
        t0 = time.time()
        nB = output.data.size(0)
@ -172,83 +100,43 @@ class RegionLoss(nn.Module):
        nC = self.num_classes
        nH = output.data.size(2)
        nW = output.data.size(3)
+        num_keypoints = self.num_keypoints

        # Activation
-        output = output.view(nB, nA, (19+nC), nH, nW)
-        x0     = F.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([0]))).view(nB, nA, nH, nW))
-        y0     = F.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([1]))).view(nB, nA, nH, nW))
-        x1     = output.index_select(2, Variable(torch.cuda.LongTensor([2]))).view(nB, nA, nH, nW)
-        y1     = output.index_select(2, Variable(torch.cuda.LongTensor([3]))).view(nB, nA, nH, nW)
-        x2     = output.index_select(2, Variable(torch.cuda.LongTensor([4]))).view(nB, nA, nH, nW)
-        y2     = output.index_select(2, Variable(torch.cuda.LongTensor([5]))).view(nB, nA, nH, nW)
-        x3     = output.index_select(2, Variable(torch.cuda.LongTensor([6]))).view(nB, nA, nH, nW)
-        y3     = output.index_select(2, Variable(torch.cuda.LongTensor([7]))).view(nB, nA, nH, nW)
-        x4     = output.index_select(2, Variable(torch.cuda.LongTensor([8]))).view(nB, nA, nH, nW)
-        y4     = output.index_select(2, Variable(torch.cuda.LongTensor([9]))).view(nB, nA, nH, nW)
-        x5     = output.index_select(2, Variable(torch.cuda.LongTensor([10]))).view(nB, nA, nH, nW)
-        y5     = output.index_select(2, Variable(torch.cuda.LongTensor([11]))).view(nB, nA, nH, nW)
-        x6     = output.index_select(2, Variable(torch.cuda.LongTensor([12]))).view(nB, nA, nH, nW)
-        y6     = output.index_select(2, Variable(torch.cuda.LongTensor([13]))).view(nB, nA, nH, nW)
-        x7     = output.index_select(2, Variable(torch.cuda.LongTensor([14]))).view(nB, nA, nH, nW)
-        y7     = output.index_select(2, Variable(torch.cuda.LongTensor([15]))).view(nB, nA, nH, nW)
-        x8     = output.index_select(2, Variable(torch.cuda.LongTensor([16]))).view(nB, nA, nH, nW)
-        y8     = output.index_select(2, Variable(torch.cuda.LongTensor([17]))).view(nB, nA, nH, nW)
-        conf   = F.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([18]))).view(nB, nA, nH, nW))
-        cls    = output.index_select(2, Variable(torch.linspace(19,19+nC-1,nC).long().cuda()))
+        output = output.view(nB, nA, (num_keypoints*2+1+nC), nH, nW)
+        x = list()
+        y = list()
+        x.append(torch.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([0]))).view(nB, nA, nH, nW)))
+        y.append(torch.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([1]))).view(nB, nA, nH, nW)))
+        for i in range(1,num_keypoints):
+            x.append(output.index_select(2, Variable(torch.cuda.LongTensor([2 * i + 0]))).view(nB, nA, nH, nW))
+            y.append(output.index_select(2, Variable(torch.cuda.LongTensor([2 * i + 1]))).view(nB, nA, nH, nW))
+        conf   = torch.sigmoid(output.index_select(2, Variable(torch.cuda.LongTensor([2 * num_keypoints]))).view(nB, nA, nH, nW))
+        cls    = output.index_select(2, Variable(torch.linspace(2*num_keypoints+1,2*num_keypoints+1+nC-1,nC).long().cuda()))
        cls    = cls.view(nB*nA, nC, nH*nW).transpose(1,2).contiguous().view(nB*nA*nH*nW, nC)
        t1     = time.time()

        # Create pred boxes
-        pred_corners = torch.cuda.FloatTensor(18, nB*nA*nH*nW)
+        pred_corners = torch.cuda.FloatTensor(2*num_keypoints, nB*nA*nH*nW)
        grid_x = torch.linspace(0, nW-1, nW).repeat(nH,1).repeat(nB*nA, 1, 1).view(nB*nA*nH*nW).cuda()
        grid_y = torch.linspace(0, nH-1, nH).repeat(nW,1).t().repeat(nB*nA, 1, 1).view(nB*nA*nH*nW).cuda()
-        pred_corners[0]  = (x0.data + grid_x) / nW
-        pred_corners[1]  = (y0.data + grid_y) / nH
-        pred_corners[2]  = (x1.data + grid_x) / nW
-        pred_corners[3]  = (y1.data + grid_y) / nH
-        pred_corners[4]  = (x2.data + grid_x) / nW
-        pred_corners[5]  = (y2.data + grid_y) / nH
-        pred_corners[6]  = (x3.data + grid_x) / nW
-        pred_corners[7]  = (y3.data + grid_y) / nH
-        pred_corners[8]  = (x4.data + grid_x) / nW
-        pred_corners[9]  = (y4.data + grid_y) / nH
-        pred_corners[10]  = (x5.data + grid_x) / nW
-        pred_corners[11]  = (y5.data + grid_y) / nH
-        pred_corners[12] = (x6.data + grid_x) / nW
-        pred_corners[13] = (y6.data + grid_y) / nH
-        pred_corners[14] = (x7.data + grid_x) / nW
-        pred_corners[15] = (y7.data + grid_y) / nH
-        pred_corners[16] = (x8.data + grid_x) / nW
-        pred_corners[17] = (y8.data + grid_y) / nH
-        gpu_matrix = pred_corners.transpose(0,1).contiguous().view(-1,18)
+        for i in range(num_keypoints):
+            pred_corners[2 * i + 0]  = (x[i].data.view_as(grid_x) + grid_x) / nW
+            pred_corners[2 * i + 1]  = (y[i].data.view_as(grid_y) + grid_y) / nH
+        gpu_matrix = pred_corners.transpose(0,1).contiguous().view(-1,2*num_keypoints)
        pred_corners = convert2cpu(gpu_matrix)
        t2 = time.time()

        # Build targets
-        nGT, nCorrect, coord_mask, conf_mask, cls_mask, tx0, tx1, tx2, tx3, tx4, tx5, tx6, tx7, tx8, ty0, ty1, ty2, ty3, ty4, ty5, ty6, ty7, ty8, tconf, tcls = \
-                       build_targets(pred_corners, target.data, self.anchors, nA, nC, nH, nW, self.noobject_scale, self.object_scale, self.thresh, self.seen)
+        nGT, nCorrect, coord_mask, conf_mask, cls_mask, txs, tys, tconf, tcls = \
+                       build_targets(pred_corners, target.data, num_keypoints, nA, nC, nH, nW, self.noobject_scale, self.object_scale, self.thresh, self.seen)
        cls_mask   = (cls_mask == 1)
        nProposals = int((conf > 0.25).sum().data[0])
-        tx0        = Variable(tx0.cuda())
-        ty0        = Variable(ty0.cuda())
-        tx1        = Variable(tx1.cuda())
-        ty1        = Variable(ty1.cuda())
-        tx2        = Variable(tx2.cuda())
-        ty2        = Variable(ty2.cuda())
-        tx3        = Variable(tx3.cuda())
-        ty3        = Variable(ty3.cuda())
-        tx4        = Variable(tx4.cuda())
-        ty4        = Variable(ty4.cuda())
-        tx5        = Variable(tx5.cuda())
-        ty5        = Variable(ty5.cuda())
-        tx6        = Variable(tx6.cuda())
-        ty6        = Variable(ty6.cuda())
-        tx7        = Variable(tx7.cuda())
-        ty7        = Variable(ty7.cuda())
-        tx8        = Variable(tx8.cuda())
-        ty8        = Variable(ty8.cuda())
+        for i in range(num_keypoints):
+            txs[i] = Variable(txs[i].cuda())
+            tys[i] = Variable(tys[i].cuda())
        tconf      = Variable(tconf.cuda())
-        tcls       = Variable(tcls.view(-1)[cls_mask].long().cuda())
+        tcls       = Variable(tcls[cls_mask].long().cuda())
        coord_mask = Variable(coord_mask.cuda())
        conf_mask  = Variable(conf_mask.cuda().sqrt())
        cls_mask   = Variable(cls_mask.view(-1, 1).repeat(1,nC).cuda())
@ -256,33 +144,22 @@ class RegionLoss(nn.Module):
        t3 = time.time()

        # Create loss
-        loss_x0    = self.coord_scale * nn.MSELoss(size_average=False)(x0*coord_mask, tx0*coord_mask)/2.0
-        loss_y0    = self.coord_scale * nn.MSELoss(size_average=False)(y0*coord_mask, ty0*coord_mask)/2.0
-        loss_x1    = self.coord_scale * nn.MSELoss(size_average=False)(x1*coord_mask, tx1*coord_mask)/2.0
-        loss_y1    = self.coord_scale * nn.MSELoss(size_average=False)(y1*coord_mask, ty1*coord_mask)/2.0
-        loss_x2    = self.coord_scale * nn.MSELoss(size_average=False)(x2*coord_mask, tx2*coord_mask)/2.0
-        loss_y2    = self.coord_scale * nn.MSELoss(size_average=False)(y2*coord_mask, ty2*coord_mask)/2.0
-        loss_x3    = self.coord_scale * nn.MSELoss(size_average=False)(x3*coord_mask, tx3*coord_mask)/2.0
-        loss_y3    = self.coord_scale * nn.MSELoss(size_average=False)(y3*coord_mask, ty3*coord_mask)/2.0
-        loss_x4    = self.coord_scale * nn.MSELoss(size_average=False)(x4*coord_mask, tx4*coord_mask)/2.0
-        loss_y4    = self.coord_scale * nn.MSELoss(size_average=False)(y4*coord_mask, ty4*coord_mask)/2.0
-        loss_x5    = self.coord_scale * nn.MSELoss(size_average=False)(x5*coord_mask, tx5*coord_mask)/2.0
-        loss_y5    = self.coord_scale * nn.MSELoss(size_average=False)(y5*coord_mask, ty5*coord_mask)/2.0
-        loss_x6    = self.coord_scale * nn.MSELoss(size_average=False)(x6*coord_mask, tx6*coord_mask)/2.0
-        loss_y6    = self.coord_scale * nn.MSELoss(size_average=False)(y6*coord_mask, ty6*coord_mask)/2.0
-        loss_x7    = self.coord_scale * nn.MSELoss(size_average=False)(x7*coord_mask, tx7*coord_mask)/2.0
-        loss_y7    = self.coord_scale * nn.MSELoss(size_average=False)(y7*coord_mask, ty7*coord_mask)/2.0
-        loss_x8    = self.coord_scale * nn.MSELoss(size_average=False)(x8*coord_mask, tx8*coord_mask)/2.0
-        loss_y8    = self.coord_scale * nn.MSELoss(size_average=False)(y8*coord_mask, ty8*coord_mask)/2.0
+        loss_xs   = list()
+        loss_ys   = list()
+        for i in range(num_keypoints):
+            loss_xs.append(self.coord_scale * nn.MSELoss(size_average=False)(x[i]*coord_mask, txs[i]*coord_mask)/2.0)
+            loss_ys.append(self.coord_scale * nn.MSELoss(size_average=False)(y[i]*coord_mask, tys[i]*coord_mask)/2.0)
        loss_conf  = nn.MSELoss(size_average=False)(conf*conf_mask, tconf*conf_mask)/2.0
-        # loss_cls   = self.class_scale * nn.CrossEntropyLoss(size_average=False)(cls, tcls)
-        loss_cls = 0
-        loss_x     = loss_x0 + loss_x1 + loss_x2 + loss_x3 + loss_x4 + loss_x5 + loss_x6 + loss_x7 + loss_x8 
-        loss_y     = loss_y0 + loss_y1 + loss_y2 + loss_y3 + loss_y4 + loss_y5 + loss_y6 + loss_y7 + loss_y8 
-        if False:
-            loss   = loss_x + loss_y + loss_conf + loss_cls
+        loss_x    = np.sum(loss_xs)
+        loss_y    = np.sum(loss_ys)
+
+        if epoch > self.pretrain_num_epochs:
+            loss  = loss_x + loss_y + loss_conf # in single object pose estimation, there is no classification loss
        else:
-            loss   = loss_x + loss_y + loss_conf
+            # pretrain initially without confidence loss
+            # once the coordinate predictions get better, start training for confidence as well
+            loss  = loss_x + loss_y 
+
        t4 = time.time()

        if False:
@ -293,9 +170,6 @@ class RegionLoss(nn.Module):
            print('       create loss : %f' % (t4 - t3))
            print('             total : %f' % (t4 - t0))

-        if False:
-            print('%d: nGT %d, recall %d, proposals %d, loss: x %f, y %f, conf %f, cls %f, total %f' % (self.seen, nGT, nCorrect, nProposals, loss_x.data[0], loss_y.data[0], loss_conf.data[0], loss_cls.data[0], loss.data[0]))
-        else:
-            print('%d: nGT %d, recall %d, proposals %d, loss: x %f, y %f, conf %f, total %f' % (self.seen, nGT, nCorrect, nProposals, loss_x.data[0], loss_y.data[0], loss_conf.data[0], loss.data[0]))
+        print('%d: nGT %d, recall %d, proposals %d, loss: x %f, y %f, conf %f, total %f' % (self.seen, nGT, nCorrect, nProposals, loss_x.data[0], loss_y.data[0], loss_conf.data[0], loss.data[0]))
        
        return loss
--- a/train.py
+++ b/train.py
@ -11,6 +11,7 @@ import os
 import random
 import math
 import shutil
+import argparse
 from torchvision import datasets, transforms
 from torch.autograd import Variable # Useful info about autograd: http://pytorch.org/docs/master/notes/autograd.html

@ -21,6 +22,9 @@ from region_loss import RegionLoss
 from darknet import Darknet
 from MeshPly import MeshPly

+import warnings
+warnings.filterwarnings("ignore")
+
 # Create new directory
 def makedirs(path):
    if not os.path.exists( path ):
@ -49,13 +53,15 @@ def train(epoch):
    t0 = time.time()

    # Get the dataloader for training dataset
-    train_loader = torch.utils.data.DataLoader(dataset.listDataset(trainlist, shape=(init_width, init_height),
-                                                            shuffle=True,
-                                                            transform=transforms.Compose([transforms.ToTensor(),]), 
-                                                            train=True, 
-                                                            seen=model.seen,
-                                                            batch_size=batch_size,
-                                                            num_workers=num_workers, bg_file_names=bg_file_names),
+    train_loader = torch.utils.data.DataLoader(dataset.listDataset(trainlist, 
+                                                                   shape=(init_width, init_height),
+                                                            	   shuffle=True,
+                                                            	   transform=transforms.Compose([transforms.ToTensor(),]), 
+                                                            	   train=True, 
+                                                            	   seen=model.seen,
+                                                            	   batch_size=batch_size,
+                                                            	   num_workers=num_workers, 
+                                                                   bg_file_names=bg_file_names),
                                                batch_size=batch_size, shuffle=False, **kwargs)

    # TRAINING
@ -88,7 +94,7 @@ def train(epoch):
        model.seen = model.seen + data.data.size(0)
        region_loss.seen = region_loss.seen + data.data.size(0)
        # Compute loss, grow an array of losses for saving later on
-        loss = region_loss(output, target)
+        loss = region_loss(output, target, epoch)
        training_iters.append(epoch * math.ceil(len(train_loader.dataset) / float(batch_size) ) + niter)
        training_losses.append(convert2cpu(loss.data))
        niter += 1
@ -147,7 +153,6 @@ def test(epoch, niter):
    errs_trans           = []
    errs_angle           = []
    errs_corner2D        = []
-
    logging("   Testing...")
    logging("   Number of test samples: %d" % len(test_loader.dataset))
    notpredicted = 0
@ -165,34 +170,25 @@ def test(epoch, niter):
        output = model(data).data  
        t3 = time.time()
        # Using confidence threshold, eliminate low-confidence predictions
-        all_boxes = get_region_boxes(output, conf_thresh, num_classes, anchors, num_anchors)        
+        all_boxes = get_region_boxes(output, num_classes, num_keypoints)        
        t4 = time.time()
        # Iterate through all batch elements
-        for i in range(output.size(0)):
-            # For each image, get all the predictions
-            boxes   = all_boxes[i]
+        for box_pr, target in zip([all_boxes], [target[0]]):
            # For each image, get all the targets (for multiple object pose estimation, there might be more than 1 target per image)
-            truths  = target[i].view(-1, 21)
-            # Get how many object are present in the scene
-            num_gts = truths_length(truths)
-
+            truths = target.view(-1, num_keypoints*2+3)
+            # Get how many objects are present in the scene
+            num_gts    = truths_length(truths)
            # Iterate through each ground-truth object
            for k in range(num_gts):
-                box_gt        = [truths[k][1], truths[k][2], truths[k][3], truths[k][4], truths[k][5], truths[k][6], 
-                                truths[k][7], truths[k][8], truths[k][9], truths[k][10], truths[k][11], truths[k][12], 
-                                truths[k][13], truths[k][14], truths[k][15], truths[k][16], truths[k][17], truths[k][18], 1.0, 1.0, truths[k][0]]
-                best_conf_est = -1
-
-                # If the prediction has the highest confidence, choose it as our prediction
-                for j in range(len(boxes)):
-                    if boxes[j][18] > best_conf_est:
-                        best_conf_est = boxes[j][18]
-                        box_pr        = boxes[j]
-                        match         = corner_confidence9(box_gt[:18], torch.FloatTensor(boxes[j][:18]))
-
+                box_gt = list()
+                for j in range(1, 2*num_keypoints+1):
+                    box_gt.append(truths[k][j])
+                box_gt.extend([1.0, 1.0])
+                box_gt.append(truths[k][0])
+                   
                # Denormalize the corner predictions 
-                corners2D_gt = np.array(np.reshape(box_gt[:18], [9, 2]), dtype='float32')
-                corners2D_pr = np.array(np.reshape(box_pr[:18], [9, 2]), dtype='float32')
+                corners2D_gt = np.array(np.reshape(box_gt[:num_keypoints*2], [num_keypoints, 2]), dtype='float32')
+                corners2D_pr = np.array(np.reshape(box_pr[:num_keypoints*2], [num_keypoints, 2]), dtype='float32')
                corners2D_gt[:, 0] = corners2D_gt[:, 0] * im_width
                corners2D_gt[:, 1] = corners2D_gt[:, 1] * im_height               
                corners2D_pr[:, 0] = corners2D_pr[:, 0] * im_width
@ -208,7 +204,6 @@ def test(epoch, niter):
                R_pr, t_pr = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_pr, np.array(internal_calibration, dtype='float32'))

                # Compute errors
-
                # Compute translation error
                trans_dist   = np.sqrt(np.sum(np.square(t_gt - t_pr)))
                errs_trans.append(trans_dist)
@ -242,12 +237,13 @@ def test(epoch, niter):
        t5 = time.time()

    # Compute 2D projection, 6D pose and 5cm5degree scores
-    px_threshold = 5
-    acc = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)
-    acc3d = len(np.where(np.array(errs_3d) <= vx_threshold)[0]) * 100. / (len(errs_3d)+eps)
-    acc5cm5deg = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)
-    corner_acc = len(np.where(np.array(errs_corner2D) <= px_threshold)[0]) * 100. / (len(errs_corner2D)+eps)
-    mean_err_2d = np.mean(errs_2d)
+    px_threshold = 5 # 5 pixel threshold for 2D reprojection error is standard in recent sota 6D object pose estimation works 
+    eps          = 1e-5
+    acc          = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)
+    acc3d        = len(np.where(np.array(errs_3d) <= vx_threshold)[0]) * 100. / (len(errs_3d)+eps)
+    acc5cm5deg   = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)
+    corner_acc   = len(np.where(np.array(errs_corner2D) <= px_threshold)[0]) * 100. / (len(errs_corner2D)+eps)
+    mean_err_2d  = np.mean(errs_2d)
    mean_corner_err_2d = np.mean(errs_corner2D)
    nts = float(testing_samples)
    
@ -276,24 +272,28 @@ def test(epoch, niter):

 if __name__ == "__main__":

-    # Training settings
-    datacfg       = sys.argv[1]
-    cfgfile       = sys.argv[2]
-    weightfile    = sys.argv[3]
+    # Parse configuration files
+    parser = argparse.ArgumentParser(description='SingleShotPose')
+    parser.add_argument('--datacfg', type=str, default='cfg/ape.data') # data config
+    parser.add_argument('--modelcfg', type=str, default='cfg/yolo-pose.cfg') # network config
+    parser.add_argument('--initweightfile', type=str, default='cfg/darknet19_448.conv.23') # imagenet initialized weights
+    parser.add_argument('--pretrain_num_epochs', type=int, default=15) # how many epoch to pretrain
+    args                = parser.parse_args()
+    datacfg             = args.datacfg
+    modelcfg            = args.modelcfg
+    initweightfile      = args.initweightfile
+    pretrain_num_epochs = args.pretrain_num_epochs

    # Parse configuration files
    data_options  = read_data_cfg(datacfg)
-    net_options   = parse_cfg(cfgfile)[0]
+    net_options   = parse_cfg(modelcfg)[0]
    trainlist     = data_options['train']
    testlist      = data_options['valid']
-    nsamples      = file_lines(trainlist)
-    gpus          = data_options['gpus']  # e.g. 0,1,2,3
-    gpus = '0'
+    gpus          = data_options['gpus'] 
    meshname      = data_options['mesh']
    num_workers   = int(data_options['num_workers'])
    backupdir     = data_options['backup']
-    diam          = float(data_options['diam'])
-    vx_threshold  = diam * 0.1
+    vx_threshold  = float(data_options['diam']) * 0.1 # threshold for the ADD metric
    if not os.path.exists(backupdir):
        makedirs(backupdir)
    batch_size    = int(net_options['batch'])
@ -301,49 +301,49 @@ if __name__ == "__main__":
    learning_rate = float(net_options['learning_rate'])
    momentum      = float(net_options['momentum'])
    decay         = float(net_options['decay'])
-    steps         = [float(step) for step in net_options['steps'].split(',')]
+    nsamples      = file_lines(trainlist)
+    batch_size    = int(net_options['batch'])
+    nbatches      = nsamples / batch_size
+    steps         = [float(step)*nbatches for step in net_options['steps'].split(',')]
    scales        = [float(scale) for scale in net_options['scales'].split(',')]
    bg_file_names = get_all_files('VOCdevkit/VOC2012/JPEGImages')

    # Train parameters
-    max_epochs    = 700 # max_batches*batch_size/nsamples+1
-    use_cuda      = True
-    seed          = int(time.time())
-    eps           = 1e-5
-    save_interval = 10 # epoches
-    dot_interval  = 70 # batches
-    best_acc      = -1 
-
+    max_epochs    = int(net_options['max_epochs'])
+    num_keypoints = int(net_options['num_keypoints'])
+    
    # Test parameters
-    conf_thresh   = 0.1
-    nms_thresh    = 0.4
-    iou_thresh    = 0.5
-    im_width      = 640
-    im_height     = 480 
+    im_width    = int(data_options['width'])
+    im_height   = int(data_options['height'])
+    fx          = float(data_options['fx'])
+    fy          = float(data_options['fy'])
+    u0          = float(data_options['u0'])
+    v0          = float(data_options['v0'])
+    test_width  = int(net_options['test_width'])
+    test_height = int(net_options['test_height'])

    # Specify which gpus to use
+    use_cuda      = True
+    seed          = int(time.time())
    torch.manual_seed(seed)
    if use_cuda:
        os.environ['CUDA_VISIBLE_DEVICES'] = gpus
        torch.cuda.manual_seed(seed)

    # Specifiy the model and the loss
-    model       = Darknet(cfgfile)
-    region_loss = model.loss
+    model       = Darknet(modelcfg)
+    region_loss = RegionLoss(num_keypoints=9, num_classes=1, anchors=[], num_anchors=1, pretrain_num_epochs=15)

    # Model settings
-    # model.load_weights(weightfile)
-    model.load_weights_until_last(weightfile) 
+    model.load_weights_until_last(initweightfile) 
    model.print_network()
    model.seen = 0
    region_loss.iter  = model.iter
    region_loss.seen  = model.seen
-    processed_batches = model.seen/batch_size
+    processed_batches = model.seen//batch_size
    init_width        = model.width
    init_height       = model.height
-    test_width        = 672
-    test_height       = 672
-    init_epoch        = model.seen/nsamples 
+    init_epoch        = model.seen//nsamples 

    # Variable to save
    training_iters          = []
@ -359,16 +359,18 @@ if __name__ == "__main__":
    mesh                 = MeshPly(meshname)
    vertices             = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()
    corners3D            = get_3D_corners(vertices)
-    internal_calibration = get_camera_intrinsic()
+    internal_calibration = get_camera_intrinsic(u0, v0, fx, fy)
+

    # Specify the number of workers
    kwargs = {'num_workers': num_workers, 'pin_memory': True} if use_cuda else {}

    # Get the dataloader for test data
-    test_loader = torch.utils.data.DataLoader(dataset.listDataset(testlist, shape=(test_width, test_height),
-                                                                           shuffle=False,
-                                                                           transform=transforms.Compose([transforms.ToTensor(),]), 
-                                                                           train=False),
+    test_loader = torch.utils.data.DataLoader(dataset.listDataset(testlist, 
+    															  shape=(test_width, test_height),
+                                                                  shuffle=False,
+                                                                  transform=transforms.Compose([transforms.ToTensor(),]), 
+                                                                  train=False),
                                             batch_size=1, shuffle=False, **kwargs)

    # Pass the model to GPU
@ -384,30 +386,25 @@ if __name__ == "__main__":
        else:
            params += [{'params': [value], 'weight_decay': decay*batch_size}]
    optimizer = optim.SGD(model.parameters(), lr=learning_rate/batch_size, momentum=momentum, dampening=0, weight_decay=decay*batch_size)
-    # optimizer = optim.Adam(model.parameters(), lr=0.001) # Adam optimization

-    evaluate = False
-    if evaluate:
-        logging('evaluating ...')
-        test(0, 0)
-    else:
-        for epoch in range(init_epoch, max_epochs): 
-            # TRAIN
-            niter = train(epoch)
-            # TEST and SAVE
-            if (epoch % 10 == 0) and (epoch is not 0): 
-                test(epoch, niter)
-                logging('save training stats to %s/costs.npz' % (backupdir))
-                np.savez(os.path.join(backupdir, "costs.npz"),
-                    training_iters=training_iters,
-                    training_losses=training_losses,
-                    testing_iters=testing_iters,
-                    testing_accuracies=testing_accuracies,
-                    testing_errors_pixel=testing_errors_pixel,
-                    testing_errors_angle=testing_errors_angle) 
-                if (testing_accuracies[-1] > best_acc ):
-                    best_acc = testing_accuracies[-1]
-                    logging('best model so far!')
-                    logging('save weights to %s/model.weights' % (backupdir))
-                    model.save_weights('%s/model.weights' % (backupdir))
-        shutil.copy2('%s/model.weights' % (backupdir), '%s/model_backup.weights' % (backupdir))
+    best_acc      = -sys.maxsize 
+    for epoch in range(init_epoch, max_epochs): 
+        # TRAIN
+        niter = train(epoch)
+        # TEST and SAVE
+        if (epoch % 10 == 0) and (epoch > 15): 
+            test(epoch, niter)
+            logging('save training stats to %s/costs.npz' % (backupdir))
+            np.savez(os.path.join(backupdir, "costs.npz"),
+                training_iters=training_iters,
+                training_losses=training_losses,
+                testing_iters=testing_iters,
+                testing_accuracies=testing_accuracies,
+                testing_errors_pixel=testing_errors_pixel,
+                testing_errors_angle=testing_errors_angle) 
+            if (testing_accuracies[-1] > best_acc ):
+                best_acc = testing_accuracies[-1]
+                logging('best model so far!')
+                logging('save weights to %s/model.weights' % (backupdir))
+                model.save_weights('%s/model.weights' % (backupdir))
+    # shutil.copy2('%s/model.weights' % (backupdir), '%s/model_backup.weights' % (backupdir))
--- a/utils.py
+++ b/utils.py
@ -13,6 +13,11 @@ from scipy import spatial
 import struct 
 import imghdr 

+# Create new directory
+def makedirs(path):
+    if not os.path.exists( path ):
+        os.makedirs( path )
+
 def get_all_files(directory):
    files = []

@ -29,12 +34,8 @@ def calcAngularDistance(gt_rot, pr_rot):
    trace = np.trace(rotDiff) 
    return np.rad2deg(np.arccos((trace-1.0)/2.0))

-def get_camera_intrinsic():
-    K = np.zeros((3, 3), dtype='float64')
-    K[0, 0], K[0, 2] = 572.4114, 325.2611
-    K[1, 1], K[1, 2] = 573.5704, 242.0489
-    K[2, 2] = 1.
-    return K
+def get_camera_intrinsic(u0, v0, fx, fy):
+    return np.array([[fx, 0.0, u0], [0.0, fy, v0], [0.0, 0.0, 1.0]])

 def compute_projection(points_3D, transformation, internal_calibration):
    projections_2d = np.zeros((2, points_3D.shape[1]), dtype='float32')
@ -70,7 +71,6 @@ def get_3D_corners(vertices):
    max_y = np.max(vertices[1,:])
    min_z = np.min(vertices[2,:])
    max_z = np.max(vertices[2,:])
-
    corners = np.array([[min_x, min_y, min_z],
                        [min_x, min_y, max_z],
                        [min_x, max_y, min_z],
@ -87,35 +87,25 @@ def pnp(points_3D, points_2D, cameraMatrix):
    try:
        distCoeffs = pnp.distCoeffs
    except:
-        distCoeffs = np.zeros((8, 1), dtype='float32')
+        distCoeffs = np.zeros((8, 1), dtype='float32') 

    assert points_2D.shape[0] == points_2D.shape[0], 'points 3D and points 2D must have same number of vertices'

    _, R_exp, t = cv2.solvePnP(points_3D,
-                              # points_2D,
                              np.ascontiguousarray(points_2D[:,:2]).reshape((-1,1,2)),
                              cameraMatrix,
-                              distCoeffs)
-                              # , None, None, False, cv2.SOLVEPNP_UPNP)
-                                
-    # R_exp, t, _ = cv2.solvePnPRansac(points_3D,
-    #                            points_2D,
-    #                            cameraMatrix,
-    #                            distCoeffs,
-    #                            reprojectionError=12.0)
-    # 
+                              distCoeffs)                            

    R, _ = cv2.Rodrigues(R_exp)
-    # Rt = np.c_[R, t]
    return R, t

 def get_2d_bb(box, size):
    x = box[0]
    y = box[1]
-    min_x = np.min(np.reshape(box, [9,2])[:,0])
-    max_x = np.max(np.reshape(box, [9,2])[:,0])
-    min_y = np.min(np.reshape(box, [9,2])[:,1])
-    max_y = np.max(np.reshape(box, [9,2])[:,1])
+    min_x = np.min(np.reshape(box, [-1,2])[:,0])
+    max_x = np.max(np.reshape(box, [-1,2])[:,0])
+    min_y = np.min(np.reshape(box, [-1,2])[:,1])
+    max_y = np.max(np.reshape(box, [-1,2])[:,1])
    w = max_x - min_x
    h = max_y - min_y
    new_box = [x*size, y*size, w*size, h*size]
@ -145,89 +135,7 @@ def compute_2d_bb_from_orig_pix(pts, size):
    new_box = [cx*size, cy*size, w*size, h*size]
    return new_box

-def bbox_iou(box1, box2, x1y1x2y2=False):
-    if x1y1x2y2:
-        mx = min(box1[0], box2[0])
-        Mx = max(box1[2], box2[2])
-        my = min(box1[1], box2[1])
-        My = max(box1[3], box2[3])
-        w1 = box1[2] - box1[0]
-        h1 = box1[3] - box1[1]
-        w2 = box2[2] - box2[0]
-        h2 = box2[3] - box2[1]
-    else:
-        mx = min(box1[0]-box1[2]/2.0, box2[0]-box2[2]/2.0)
-        Mx = max(box1[0]+box1[2]/2.0, box2[0]+box2[2]/2.0)
-        my = min(box1[1]-box1[3]/2.0, box2[1]-box2[3]/2.0)
-        My = max(box1[1]+box1[3]/2.0, box2[1]+box2[3]/2.0)
-        w1 = box1[2]
-        h1 = box1[3]
-        w2 = box2[2]
-        h2 = box2[3]
-    uw = Mx - mx
-    uh = My - my
-    cw = w1 + w2 - uw
-    ch = h1 + h2 - uh
-    carea = 0
-    if cw <= 0 or ch <= 0:
-        return 0.0
-
-    area1 = w1 * h1
-    area2 = w2 * h2
-    carea = cw * ch
-    uarea = area1 + area2 - carea
-    return carea/uarea
-
-def corner_confidences(gt_corners, pr_corners, th=30, sharpness=2, im_width=640, im_height=480):
-    ''' gt_corners: Ground-truth 2D projections of the 3D bounding box corners, shape: (16 x nA), type: torch.FloatTensor
-        pr_corners: Prediction for the 2D projections of the 3D bounding box corners, shape: (16 x nA), type: torch.FloatTensor
-        th        : distance threshold, type: int
-        sharpness : sharpness of the exponential that assigns a confidence value to the distance
-        -----------
-        return    : a torch.FloatTensor of shape (nA,) with 8 confidence values 
-    '''
-    shape = gt_corners.size()
-    nA = shape[1]  
-    dist = gt_corners - pr_corners
-    dist = dist.t().contiguous().view(nA, 8, 2)
-    dist[:, :, 0] = dist[:, :, 0] * im_width
-    dist[:, :, 1] = dist[:, :, 1] * im_height
-
-    eps = 1e-5
-    distthresh = torch.FloatTensor([th]).repeat(nA, 8) 
-    dist = torch.sqrt(torch.sum((dist)**2, dim=2)).squeeze() # nA x 8 
-    mask = (dist < distthresh).type(torch.FloatTensor)
-    conf = torch.exp(sharpness*(1 - dist/distthresh))-1  # mask * (torch.exp(math.log(2) * (1.0 - dist/rrt)) - 1)
-    conf0 = torch.exp(sharpness*(1 - torch.zeros(conf.size(0),1))) - 1
-    conf = conf / conf0.repeat(1, 8)
-    # conf = 1 - dist/distthresh
-    conf = mask * conf  # nA x 8
-    mean_conf = torch.mean(conf, dim=1)
-    return mean_conf
-
-def corner_confidence(gt_corners, pr_corners, th=30, sharpness=2, im_width=640, im_height=480):
-    ''' gt_corners: Ground-truth 2D projections of the 3D bounding box corners, shape: (16,) type: list
-        pr_corners: Prediction for the 2D projections of the 3D bounding box corners, shape: (16,), type: list
-        th        : distance threshold, type: int
-        sharpness : sharpness of the exponential that assigns a confidence value to the distance
-        -----------
-        return    : a list of shape (8,) with 8 confidence values 
-    '''
-    dist = torch.FloatTensor(gt_corners) - pr_corners
-    dist = dist.view(8, 2)
-    dist[:, 0] = dist[:, 0] * im_width
-    dist[:, 1] = dist[:, 1] * im_height
-    eps = 1e-5
-    dist  = torch.sqrt(torch.sum((dist)**2, dim=1))
-    mask  = (dist < th).type(torch.FloatTensor)
-    conf  = torch.exp(sharpness * (1.0 - dist/th)) - 1
-    conf0 = torch.exp(torch.FloatTensor([sharpness])) - 1 + eps
-    conf  = conf / conf0.repeat(8, 1)
-    # conf = 1.0 - dist/th
-    conf  = mask * conf 
-    return torch.mean(conf)
-
-def corner_confidences9(gt_corners, pr_corners, th=80, sharpness=2, im_width=640, im_height=480):
+def corner_confidences(gt_corners, pr_corners, th=80, sharpness=2, im_width=640, im_height=480):
    ''' gt_corners: Ground-truth 2D projections of the 3D bounding box corners, shape: (16 x nA), type: torch.FloatTensor
        pr_corners: Prediction for the 2D projections of the 3D bounding box corners, shape: (16 x nA), type: torch.FloatTensor
        th        : distance threshold, type: int
@ -238,23 +146,25 @@ def corner_confidences9(gt_corners, pr_corners, th=80, sharpness=2, im_width=640
    shape = gt_corners.size()
    nA = shape[1]  
    dist = gt_corners - pr_corners
-    dist = dist.t().contiguous().view(nA, 9, 2)
+    num_el = dist.numel()
+    num_keypoints = num_el//(nA*2)
+    dist = dist.t().contiguous().view(nA, num_keypoints, 2)
    dist[:, :, 0] = dist[:, :, 0] * im_width
    dist[:, :, 1] = dist[:, :, 1] * im_height

    eps = 1e-5
-    distthresh = torch.FloatTensor([th]).repeat(nA, 9) 
+    distthresh = torch.FloatTensor([th]).repeat(nA, num_keypoints) 
    dist = torch.sqrt(torch.sum((dist)**2, dim=2)).squeeze() # nA x 9
    mask = (dist < distthresh).type(torch.FloatTensor)
    conf = torch.exp(sharpness*(1 - dist/distthresh))-1  # mask * (torch.exp(math.log(2) * (1.0 - dist/rrt)) - 1)
    conf0 = torch.exp(sharpness*(1 - torch.zeros(conf.size(0),1))) - 1
-    conf = conf / conf0.repeat(1, 9)
+    conf = conf / conf0.repeat(1, num_keypoints)
    # conf = 1 - dist/distthresh
    conf = mask * conf  # nA x 9
    mean_conf = torch.mean(conf, dim=1)
    return mean_conf

-def corner_confidence9(gt_corners, pr_corners, th=80, sharpness=2, im_width=640, im_height=480):
+def corner_confidence(gt_corners, pr_corners, th=80, sharpness=2, im_width=640, im_height=480):
    ''' gt_corners: Ground-truth 2D projections of the 3D bounding box corners, shape: (18,) type: list
        pr_corners: Prediction for the 2D projections of the 3D bounding box corners, shape: (18,), type: list
        th        : distance threshold, type: int
@ -263,7 +173,8 @@ def corner_confidence9(gt_corners, pr_corners, th=80, sharpness=2, im_width=640,
        return    : a list of shape (9,) with 9 confidence values 
    '''
    dist = torch.FloatTensor(gt_corners) - pr_corners
-    dist = dist.view(9, 2)
+    num_keypoints = dist.numel()//2
+    dist = dist.view(num_keypoints, 2)
    dist[:, 0] = dist[:, 0] * im_width
    dist[:, 1] = dist[:, 1] * im_height
    eps = 1e-5
@ -271,8 +182,7 @@ def corner_confidence9(gt_corners, pr_corners, th=80, sharpness=2, im_width=640,
    mask  = (dist < th).type(torch.FloatTensor)
    conf  = torch.exp(sharpness * (1.0 - dist/th)) - 1
    conf0 = torch.exp(torch.FloatTensor([sharpness])) - 1 + eps
-    conf  = conf / conf0.repeat(9, 1)
-    # conf = 1.0 - dist/th
+    conf  = conf / conf0.repeat(num_keypoints, 1)
    conf  = mask * conf 
    return torch.mean(conf)

@ -284,27 +194,6 @@ def softmax(x):
    x = x/x.sum()
    return x

-def nms(boxes, nms_thresh):
-    if len(boxes) == 0:
-        return boxes
-
-    det_confs = torch.zeros(len(boxes))
-    for i in range(len(boxes)):
-        det_confs[i] = 1-boxes[i][4]                
-
-    _,sortIds = torch.sort(det_confs)
-    out_boxes = []
-    for i in range(len(boxes)):
-        box_i = boxes[sortIds[i]]
-        if box_i[4] > 0:
-            out_boxes.append(box_i)
-            for j in range(i+1, len(boxes)):
-                box_j = boxes[sortIds[j]]
-                if bbox_iou(box_i, box_j, x1y1x2y2=False) > nms_thresh:
-                    #print(box_i, box_j, bbox_iou(box_i, box_j, x1y1x2y2=False))
-                    box_j[4] = 0
-    return out_boxes
-
 def fix_corner_order(corners2D_gt):
    corners2D_gt_corrected = np.zeros((9, 2), dtype='float32')
    corners2D_gt_corrected[0, :] = corners2D_gt[0, :]
@ -324,44 +213,33 @@ def convert2cpu(gpu_matrix):
 def convert2cpu_long(gpu_matrix):
    return torch.LongTensor(gpu_matrix.size()).copy_(gpu_matrix)

-def get_region_boxes(output, conf_thresh, num_classes, only_objectness=1, validation=False):
+def get_region_boxes(output, num_classes, num_keypoints, only_objectness=1, validation=True):
    
    # Parameters
    anchor_dim = 1 
    if output.dim() == 3:
        output = output.unsqueeze(0)
    batch = output.size(0)
-    assert(output.size(1) == (19+num_classes)*anchor_dim)
+    assert(output.size(1) == (2*num_keypoints+1+num_classes)*anchor_dim)
    h = output.size(2)
    w = output.size(3)

    # Activation
    t0 = time.time()
-    all_boxes = []
-    max_conf = -100000
-    output    = output.view(batch*anchor_dim, 19+num_classes, h*w).transpose(0,1).contiguous().view(19+num_classes, batch*anchor_dim*h*w)
+    max_conf = -sys.maxsize
+    output    = output.view(batch*anchor_dim, 2*num_keypoints+1+num_classes, h*w).transpose(0,1).contiguous().view(2*num_keypoints+1+num_classes, batch*anchor_dim*h*w)
    grid_x    = torch.linspace(0, w-1, w).repeat(h,1).repeat(batch*anchor_dim, 1, 1).view(batch*anchor_dim*h*w).cuda()
    grid_y    = torch.linspace(0, h-1, h).repeat(w,1).t().repeat(batch*anchor_dim, 1, 1).view(batch*anchor_dim*h*w).cuda()
-    xs0       = torch.sigmoid(output[0]) + grid_x
-    ys0       = torch.sigmoid(output[1]) + grid_y
-    xs1       = output[2] + grid_x
-    ys1       = output[3] + grid_y
-    xs2       = output[4] + grid_x
-    ys2       = output[5] + grid_y
-    xs3       = output[6] + grid_x
-    ys3       = output[7] + grid_y
-    xs4       = output[8] + grid_x
-    ys4       = output[9] + grid_y
-    xs5       = output[10] + grid_x
-    ys5       = output[11] + grid_y
-    xs6       = output[12] + grid_x
-    ys6       = output[13] + grid_y
-    xs7       = output[14] + grid_x
-    ys7       = output[15] + grid_y
-    xs8       = output[16] + grid_x
-    ys8       = output[17] + grid_y
-    det_confs = torch.sigmoid(output[18])
-    cls_confs = torch.nn.Softmax()(Variable(output[19:19+num_classes].transpose(0,1))).data
+    
+    xs = list()
+    ys = list()
+    xs.append(torch.sigmoid(output[0]) + grid_x)
+    ys.append(torch.sigmoid(output[1]) + grid_y)
+    for j in range(1,num_keypoints):
+        xs.append(output[2*j + 0] + grid_x)
+        ys.append(output[2*j + 1] + grid_y)
+    det_confs = torch.sigmoid(output[2*num_keypoints])
+    cls_confs = torch.nn.Softmax()(Variable(output[2*num_keypoints+1:2*num_keypoints+1+num_classes].transpose(0,1))).data
    cls_max_confs, cls_max_ids = torch.max(cls_confs, 1)
    cls_max_confs = cls_max_confs.view(-1)
    cls_max_ids   = cls_max_ids.view(-1)
@ -373,32 +251,15 @@ def get_region_boxes(output, conf_thresh, num_classes, only_objectness=1, valida
    det_confs = convert2cpu(det_confs)
    cls_max_confs = convert2cpu(cls_max_confs)
    cls_max_ids = convert2cpu_long(cls_max_ids)
-    xs0 = convert2cpu(xs0)
-    ys0 = convert2cpu(ys0)
-    xs1 = convert2cpu(xs1)
-    ys1 = convert2cpu(ys1)
-    xs2 = convert2cpu(xs2)
-    ys2 = convert2cpu(ys2)
-    xs3 = convert2cpu(xs3)
-    ys3 = convert2cpu(ys3)
-    xs4 = convert2cpu(xs4)
-    ys4 = convert2cpu(ys4)
-    xs5 = convert2cpu(xs5)
-    ys5 = convert2cpu(ys5)
-    xs6 = convert2cpu(xs6)
-    ys6 = convert2cpu(ys6)
-    xs7 = convert2cpu(xs7)
-    ys7 = convert2cpu(ys7)
-    xs8 = convert2cpu(xs8)
-    ys8 = convert2cpu(ys8)
+    for j in range(num_keypoints):
+        xs[j] = convert2cpu(xs[j])
+        ys[j] = convert2cpu(ys[j])
    if validation:
        cls_confs = convert2cpu(cls_confs.view(-1, num_classes))
    t2 = time.time()

    # Boxes filter
    for b in range(batch):
-        boxes = []
-        max_conf = -1
        for cy in range(h):
            for cx in range(w):
                for i in range(anchor_dim):
@ -411,66 +272,20 @@ def get_region_boxes(output, conf_thresh, num_classes, only_objectness=1, valida
                    
                    if conf > max_conf:
                        max_conf = conf
-                        max_ind = ind                  
-    
-                    if conf > conf_thresh:
-                        bcx0 = xs0[ind]
-                        bcy0 = ys0[ind]
-                        bcx1 = xs1[ind]
-                        bcy1 = ys1[ind]
-                        bcx2 = xs2[ind]
-                        bcy2 = ys2[ind]
-                        bcx3 = xs3[ind]
-                        bcy3 = ys3[ind]
-                        bcx4 = xs4[ind]
-                        bcy4 = ys4[ind]
-                        bcx5 = xs5[ind]
-                        bcy5 = ys5[ind]
-                        bcx6 = xs6[ind]
-                        bcy6 = ys6[ind]
-                        bcx7 = xs7[ind]
-                        bcy7 = ys7[ind]
-                        bcx8 = xs8[ind]
-                        bcy8 = ys8[ind]
+                        bcx = list()
+                        bcy = list()
+                        for j in range(num_keypoints):
+                            bcx.append(xs[j][ind])
+                            bcy.append(ys[j][ind])
                        cls_max_conf = cls_max_confs[ind]
                        cls_max_id = cls_max_ids[ind]
-                        box = [bcx0/w, bcy0/h, bcx1/w, bcy1/h, bcx2/w, bcy2/h, bcx3/w, bcy3/h, bcx4/w, bcy4/h, bcx5/w, bcy5/h, bcx6/w, bcy6/h, bcx7/w, bcy7/h, bcx8/w, bcy8/h, det_conf, cls_max_conf, cls_max_id]
-                        if (not only_objectness) and validation:
-                            for c in range(num_classes):
-                                tmp_conf = cls_confs[ind][c]
-                                if c != cls_max_id and det_confs[ind]*tmp_conf > conf_thresh:
-                                    box.append(tmp_conf)
-                                    box.append(c)
-                        boxes.append(box)
-            if len(boxes) == 0:
-                bcx0 = xs0[max_ind]
-                bcy0 = ys0[max_ind]
-                bcx1 = xs1[max_ind]
-                bcy1 = ys1[max_ind]
-                bcx2 = xs2[max_ind]
-                bcy2 = ys2[max_ind]
-                bcx3 = xs3[max_ind]
-                bcy3 = ys3[max_ind]
-                bcx4 = xs4[max_ind]
-                bcy4 = ys4[max_ind]
-                bcx5 = xs5[max_ind]
-                bcy5 = ys5[max_ind]
-                bcx6 = xs6[max_ind]
-                bcy6 = ys6[max_ind]
-                bcx7 = xs7[max_ind]
-                bcy7 = ys7[max_ind]
-                bcx8 = xs8[max_ind]
-                bcy8 = ys8[max_ind]
-                cls_max_conf = cls_max_confs[max_ind]
-                cls_max_id = cls_max_ids[max_ind]
-                det_conf =  det_confs[max_ind]
-                box = [bcx0/w, bcy0/h, bcx1/w, bcy1/h, bcx2/w, bcy2/h, bcx3/w, bcy3/h, bcx4/w, bcy4/h, bcx5/w, bcy5/h, bcx6/w, bcy6/h, bcx7/w, bcy7/h, bcx8/w, bcy8/h, det_conf, cls_max_conf, cls_max_id]
-                boxes.append(box)
-                all_boxes.append(boxes)
-            else:
-                all_boxes.append(boxes)
-
-        all_boxes.append(boxes)
+                        box = list()
+                        for j in range(num_keypoints):
+                            box.append(bcx[j]/w)
+                            box.append(bcy[j]/h)
+                        box.append(det_conf)
+                        box.append(cls_max_conf)
+                        box.append(cls_max_id)                        
    t3 = time.time()
    if False:
        print('---------------------------------')
@ -478,424 +293,25 @@ def get_region_boxes(output, conf_thresh, num_classes, only_objectness=1, valida
        print('        gpu to cpu : %f' % (t2-t1))
        print('      boxes filter : %f' % (t3-t2))
        print('---------------------------------')
-    return all_boxes
+    return box


-def get_corresponding_region_boxes(output, conf_thresh, num_classes, anchors, num_anchors, correspondingclass, only_objectness=1, validation=False):
-    
-    # Parameters
-    anchor_step = len(anchors)/num_anchors
-    if output.dim() == 3:
-        output = output.unsqueeze(0)
-    batch = output.size(0)
-    assert(output.size(1) == (19+num_classes)*num_anchors)
-    h = output.size(2)
-    w = output.size(3)
-
-    # Activation
-    t0 = time.time()
-    all_boxes = []
-    max_conf = -100000
-    max_cls_conf = -100000
-    output    = output.view(batch*num_anchors, 19+num_classes, h*w).transpose(0,1).contiguous().view(19+num_classes, batch*num_anchors*h*w)
-    grid_x    = torch.linspace(0, w-1, w).repeat(h,1).repeat(batch*num_anchors, 1, 1).view(batch*num_anchors*h*w).cuda()
-    grid_y    = torch.linspace(0, h-1, h).repeat(w,1).t().repeat(batch*num_anchors, 1, 1).view(batch*num_anchors*h*w).cuda()
-    xs0       = torch.sigmoid(output[0]) + grid_x
-    ys0       = torch.sigmoid(output[1]) + grid_y
-    xs1       = output[2] + grid_x
-    ys1       = output[3] + grid_y
-    xs2       = output[4] + grid_x
-    ys2       = output[5] + grid_y
-    xs3       = output[6] + grid_x
-    ys3       = output[7] + grid_y
-    xs4       = output[8] + grid_x
-    ys4       = output[9] + grid_y
-    xs5       = output[10] + grid_x
-    ys5       = output[11] + grid_y
-    xs6       = output[12] + grid_x
-    ys6       = output[13] + grid_y
-    xs7       = output[14] + grid_x
-    ys7       = output[15] + grid_y
-    xs8       = output[16] + grid_x
-    ys8       = output[17] + grid_y
-    det_confs = torch.sigmoid(output[18])
-    cls_confs = torch.nn.Softmax()(Variable(output[19:19+num_classes].transpose(0,1))).data
-    cls_max_confs, cls_max_ids = torch.max(cls_confs, 1)
-    cls_max_confs = cls_max_confs.view(-1)
-    cls_max_ids   = cls_max_ids.view(-1)
-    t1 = time.time()
-    
-    # GPU to CPU
-    sz_hw = h*w
-    sz_hwa = sz_hw*num_anchors
-    det_confs = convert2cpu(det_confs)
-    cls_max_confs = convert2cpu(cls_max_confs)
-    cls_max_ids = convert2cpu_long(cls_max_ids)
-    xs0 = convert2cpu(xs0)
-    ys0 = convert2cpu(ys0)
-    xs1 = convert2cpu(xs1)
-    ys1 = convert2cpu(ys1)
-    xs2 = convert2cpu(xs2)
-    ys2 = convert2cpu(ys2)
-    xs3 = convert2cpu(xs3)
-    ys3 = convert2cpu(ys3)
-    xs4 = convert2cpu(xs4)
-    ys4 = convert2cpu(ys4)
-    xs5 = convert2cpu(xs5)
-    ys5 = convert2cpu(ys5)
-    xs6 = convert2cpu(xs6)
-    ys6 = convert2cpu(ys6)
-    xs7 = convert2cpu(xs7)
-    ys7 = convert2cpu(ys7)
-    xs8 = convert2cpu(xs8)
-    ys8 = convert2cpu(ys8)
-    if validation:
-        cls_confs = convert2cpu(cls_confs.view(-1, num_classes))
-    t2 = time.time()
-
-    # Boxes filter
-    for b in range(batch):
-        boxes = []
-        max_conf = -1
-        for cy in range(h):
-            for cx in range(w):
-                for i in range(num_anchors):
-                    ind = b*sz_hwa + i*sz_hw + cy*w + cx
-                    det_conf =  det_confs[ind]
-                    if only_objectness:
-                        conf = det_confs[ind]
-                    else:
-                        conf = det_confs[ind] * cls_max_confs[ind]
-                    
-                    if (det_confs[ind] > max_conf) and (cls_confs[ind, correspondingclass] > max_cls_conf):
-                        max_conf = det_confs[ind]
-                        max_cls_conf = cls_confs[ind, correspondingclass]
-                        max_ind = ind                  
-    
-                    if conf > conf_thresh:
-                        bcx0 = xs0[ind]
-                        bcy0 = ys0[ind]
-                        bcx1 = xs1[ind]
-                        bcy1 = ys1[ind]
-                        bcx2 = xs2[ind]
-                        bcy2 = ys2[ind]
-                        bcx3 = xs3[ind]
-                        bcy3 = ys3[ind]
-                        bcx4 = xs4[ind]
-                        bcy4 = ys4[ind]
-                        bcx5 = xs5[ind]
-                        bcy5 = ys5[ind]
-                        bcx6 = xs6[ind]
-                        bcy6 = ys6[ind]
-                        bcx7 = xs7[ind]
-                        bcy7 = ys7[ind]
-                        bcx8 = xs8[ind]
-                        bcy8 = ys8[ind]
-                        cls_max_conf = cls_max_confs[ind]
-                        cls_max_id = cls_max_ids[ind]
-                        box = [bcx0/w, bcy0/h, bcx1/w, bcy1/h, bcx2/w, bcy2/h, bcx3/w, bcy3/h, bcx4/w, bcy4/h, bcx5/w, bcy5/h, bcx6/w, bcy6/h, bcx7/w, bcy7/h, bcx8/w, bcy8/h, det_conf, cls_max_conf, cls_max_id]
-                        if (not only_objectness) and validation:
-                            for c in range(num_classes):
-                                tmp_conf = cls_confs[ind][c]
-                                if c != cls_max_id and det_confs[ind]*tmp_conf > conf_thresh:
-                                    box.append(tmp_conf)
-                                    box.append(c)
-                        boxes.append(box)
-        boxesnp = np.array(boxes)
-        if (len(boxes) == 0) or (not (correspondingclass in boxesnp[:,20])):
-            bcx0 = xs0[max_ind]
-            bcy0 = ys0[max_ind]
-            bcx1 = xs1[max_ind]
-            bcy1 = ys1[max_ind]
-            bcx2 = xs2[max_ind]
-            bcy2 = ys2[max_ind]
-            bcx3 = xs3[max_ind]
-            bcy3 = ys3[max_ind]
-            bcx4 = xs4[max_ind]
-            bcy4 = ys4[max_ind]
-            bcx5 = xs5[max_ind]
-            bcy5 = ys5[max_ind]
-            bcx6 = xs6[max_ind]
-            bcy6 = ys6[max_ind]
-            bcx7 = xs7[max_ind]
-            bcy7 = ys7[max_ind]
-            bcx8 = xs8[max_ind]
-            bcy8 = ys8[max_ind]
-            cls_max_conf = max_cls_conf # cls_max_confs[max_ind]
-            cls_max_id = correspondingclass # cls_max_ids[max_ind]
-            det_conf = max_conf # det_confs[max_ind]
-            box = [bcx0/w, bcy0/h, bcx1/w, bcy1/h, bcx2/w, bcy2/h, bcx3/w, bcy3/h, bcx4/w, bcy4/h, bcx5/w, bcy5/h, bcx6/w, bcy6/h, bcx7/w, bcy7/h, bcx8/w, bcy8/h, det_conf, cls_max_conf, cls_max_id]
-            boxes.append(box)
-            # print(boxes)
-            all_boxes.append(boxes)
-        else:
-            all_boxes.append(boxes)
-
-    t3 = time.time()
-    if False:
-        print('---------------------------------')
-        print('matrix computation : %f' % (t1-t0))
-        print('        gpu to cpu : %f' % (t2-t1))
-        print('      boxes filter : %f' % (t3-t2))
-        print('---------------------------------')
-    return all_boxes
-
-
-def get_boxes(output, conf_thresh, num_classes, anchors, num_anchors, correspondingclass, only_objectness=1, validation=False):
-    
-    # Parameters
-    anchor_step = len(anchors)/num_anchors
-    if output.dim() == 3:
-        output = output.unsqueeze(0)
-    batch = output.size(0)
-    assert(output.size(1) == (19+num_classes)*num_anchors)
-    h = output.size(2)
-    w = output.size(3)
-
-    # Activation
-    t0 = time.time()
-    all_boxes = []
-    max_conf = -100000
-    max_cls_conf = -100000
-    output    = output.view(batch*num_anchors, 19+num_classes, h*w).transpose(0,1).contiguous().view(19+num_classes, batch*num_anchors*h*w)
-    grid_x    = torch.linspace(0, w-1, w).repeat(h,1).repeat(batch*num_anchors, 1, 1).view(batch*num_anchors*h*w).cuda()
-    grid_y    = torch.linspace(0, h-1, h).repeat(w,1).t().repeat(batch*num_anchors, 1, 1).view(batch*num_anchors*h*w).cuda()
-    xs0       = torch.sigmoid(output[0]) + grid_x
-    ys0       = torch.sigmoid(output[1]) + grid_y
-    xs1       = output[2] + grid_x
-    ys1       = output[3] + grid_y
-    xs2       = output[4] + grid_x
-    ys2       = output[5] + grid_y
-    xs3       = output[6] + grid_x
-    ys3       = output[7] + grid_y
-    xs4       = output[8] + grid_x
-    ys4       = output[9] + grid_y
-    xs5       = output[10] + grid_x
-    ys5       = output[11] + grid_y
-    xs6       = output[12] + grid_x
-    ys6       = output[13] + grid_y
-    xs7       = output[14] + grid_x
-    ys7       = output[15] + grid_y
-    xs8       = output[16] + grid_x
-    ys8       = output[17] + grid_y
-    det_confs = torch.sigmoid(output[18])
-    cls_confs = torch.nn.Softmax()(Variable(output[19:19+num_classes].transpose(0,1))).data
-    cls_max_confs, cls_max_ids = torch.max(cls_confs, 1)
-    cls_max_confs = cls_max_confs.view(-1)
-    cls_max_ids   = cls_max_ids.view(-1)
-    t1 = time.time()
-    
-    # GPU to CPU
-    sz_hw = h*w
-    sz_hwa = sz_hw*num_anchors
-    det_confs = convert2cpu(det_confs)
-    cls_max_confs = convert2cpu(cls_max_confs)
-    cls_max_ids = convert2cpu_long(cls_max_ids)
-    xs0 = convert2cpu(xs0)
-    ys0 = convert2cpu(ys0)
-    xs1 = convert2cpu(xs1)
-    ys1 = convert2cpu(ys1)
-    xs2 = convert2cpu(xs2)
-    ys2 = convert2cpu(ys2)
-    xs3 = convert2cpu(xs3)
-    ys3 = convert2cpu(ys3)
-    xs4 = convert2cpu(xs4)
-    ys4 = convert2cpu(ys4)
-    xs5 = convert2cpu(xs5)
-    ys5 = convert2cpu(ys5)
-    xs6 = convert2cpu(xs6)
-    ys6 = convert2cpu(ys6)
-    xs7 = convert2cpu(xs7)
-    ys7 = convert2cpu(ys7)
-    xs8 = convert2cpu(xs8)
-    ys8 = convert2cpu(ys8)
-    if validation:
-        cls_confs = convert2cpu(cls_confs.view(-1, num_classes))
-    t2 = time.time()
-
-    # Boxes filter
-    for b in range(batch):
-        boxes = []
-        max_conf = -1
-        for cy in range(h):
-            for cx in range(w):
-                for i in range(num_anchors):
-                    ind = b*sz_hwa + i*sz_hw + cy*w + cx
-                    det_conf =  det_confs[ind]
-                    if only_objectness:
-                        conf = det_confs[ind]
-                    else:
-                        conf = det_confs[ind] * cls_max_confs[ind]
-                    
-                    if (conf > max_conf) and (cls_confs[ind, correspondingclass] > max_cls_conf):
-                        max_conf = conf
-                        max_cls_conf = cls_confs[ind, correspondingclass]
-                        max_ind = ind                  
-    
-                    if conf > conf_thresh:
-                        bcx0 = xs0[ind]
-                        bcy0 = ys0[ind]
-                        bcx1 = xs1[ind]
-                        bcy1 = ys1[ind]
-                        bcx2 = xs2[ind]
-                        bcy2 = ys2[ind]
-                        bcx3 = xs3[ind]
-                        bcy3 = ys3[ind]
-                        bcx4 = xs4[ind]
-                        bcy4 = ys4[ind]
-                        bcx5 = xs5[ind]
-                        bcy5 = ys5[ind]
-                        bcx6 = xs6[ind]
-                        bcy6 = ys6[ind]
-                        bcx7 = xs7[ind]
-                        bcy7 = ys7[ind]
-                        bcx8 = xs8[ind]
-                        bcy8 = ys8[ind]
-                        cls_max_conf = cls_max_confs[ind]
-                        cls_max_id = cls_max_ids[ind]
-                        box = [bcx0/w, bcy0/h, bcx1/w, bcy1/h, bcx2/w, bcy2/h, bcx3/w, bcy3/h, bcx4/w, bcy4/h, bcx5/w, bcy5/h, bcx6/w, bcy6/h, bcx7/w, bcy7/h, bcx8/w, bcy8/h, det_conf, cls_max_conf, cls_max_id]
-                        if (not only_objectness) and validation:
-                            for c in range(num_classes):
-                                tmp_conf = cls_confs[ind][c]
-                                if c != cls_max_id and det_confs[ind]*tmp_conf > conf_thresh:
-                                    box.append(tmp_conf)
-                                    box.append(c)
-                        boxes.append(box)
-        boxesnp = np.array(boxes)
-        if (len(boxes) == 0) or (not (correspondingclass in boxesnp[:,20])):
-            bcx0 = xs0[max_ind]
-            bcy0 = ys0[max_ind]
-            bcx1 = xs1[max_ind]
-            bcy1 = ys1[max_ind]
-            bcx2 = xs2[max_ind]
-            bcy2 = ys2[max_ind]
-            bcx3 = xs3[max_ind]
-            bcy3 = ys3[max_ind]
-            bcx4 = xs4[max_ind]
-            bcy4 = ys4[max_ind]
-            bcx5 = xs5[max_ind]
-            bcy5 = ys5[max_ind]
-            bcx6 = xs6[max_ind]
-            bcy6 = ys6[max_ind]
-            bcx7 = xs7[max_ind]
-            bcy7 = ys7[max_ind]
-            bcx8 = xs8[max_ind]
-            bcy8 = ys8[max_ind]
-            cls_max_conf = max_cls_conf # cls_max_confs[max_ind]
-            cls_max_id = correspondingclass # cls_max_ids[max_ind]
-            det_conf =  det_confs[max_ind]
-            box = [bcx0/w, bcy0/h, bcx1/w, bcy1/h, bcx2/w, bcy2/h, bcx3/w, bcy3/h, bcx4/w, bcy4/h, bcx5/w, bcy5/h, bcx6/w, bcy6/h, bcx7/w, bcy7/h, bcx8/w, bcy8/h, det_conf, cls_max_conf, cls_max_id]
-            boxes.append(box)
-            # print(boxes)
-            all_boxes.append(boxes)
-        else:
-            all_boxes.append(boxes)
-
-    t3 = time.time()
-    if False:
-        print('---------------------------------')
-        print('matrix computation : %f' % (t1-t0))
-        print('        gpu to cpu : %f' % (t2-t1))
-        print('      boxes filter : %f' % (t3-t2))
-        print('---------------------------------')
-    return all_boxes
-
-
-def plot_boxes_cv2(img, boxes, savename=None, class_names=None, color=None):
-    import cv2
-    colors = torch.FloatTensor([[1,0,1],[0,0,1],[0,1,1],[0,1,0],[1,1,0],[1,0,0]]);
-    def get_color(c, x, max_val):
-        ratio = float(x)/max_val * 5
-        i = int(math.floor(ratio))
-        j = int(math.ceil(ratio))
-        ratio = ratio - i
-        r = (1-ratio) * colors[i][c] + ratio*colors[j][c]
-        return int(r*255)
-
-    width = img.shape[1]
-    height = img.shape[0]
-    for i in range(len(boxes)):
-        box = boxes[i]
-        x1 = int(round((box[0] - box[2]/2.0) * width))
-        y1 = int(round((box[1] - box[3]/2.0) * height))
-        x2 = int(round((box[0] + box[2]/2.0) * width))
-        y2 = int(round((box[1] + box[3]/2.0) * height))
-
-        if color:
-            rgb = color
-        else:
-            rgb = (255, 0, 0)
-        if len(box) >= 7 and class_names:
-            cls_conf = box[5]
-            cls_id = box[6]
-            print('%s: %f' % (class_names[cls_id], cls_conf))
-            classes = len(class_names)
-            offset = cls_id * 123457 % classes
-            red   = get_color(2, offset, classes)
-            green = get_color(1, offset, classes)
-            blue  = get_color(0, offset, classes)
-            if color is None:
-                rgb = (red, green, blue)
-            img = cv2.putText(img, class_names[cls_id], (x1,y1), cv2.FONT_HERSHEY_SIMPLEX, 1.2, rgb, 1)
-        img = cv2.rectangle(img, (x1,y1), (x2,y2), rgb, 1)
-    if savename:
-        print("save plot results to %s" % savename)
-        cv2.imwrite(savename, img)
-    return img
-
-def plot_boxes(img, boxes, savename=None, class_names=None):
-    colors = torch.FloatTensor([[1,0,1],[0,0,1],[0,1,1],[0,1,0],[1,1,0],[1,0,0]]);
-    def get_color(c, x, max_val):
-        ratio = float(x)/max_val * 5
-        i = int(math.floor(ratio))
-        j = int(math.ceil(ratio))
-        ratio = ratio - i
-        r = (1-ratio) * colors[i][c] + ratio*colors[j][c]
-        return int(r*255)
-
-    width = img.width
-    height = img.height
-    draw = ImageDraw.Draw(img)
-    for i in range(len(boxes)):
-        box = boxes[i]
-        x1 = (box[0] - box[2]/2.0) * width
-        y1 = (box[1] - box[3]/2.0) * height
-        x2 = (box[0] + box[2]/2.0) * width
-        y2 = (box[1] + box[3]/2.0) * height
-
-        rgb = (255, 0, 0)
-        if len(box) >= 7 and class_names:
-            cls_conf = box[5]
-            cls_id = box[6]
-            print('%s: %f' % (class_names[cls_id], cls_conf))
-            classes = len(class_names)
-            offset = cls_id * 123457 % classes
-            red   = get_color(2, offset, classes)
-            green = get_color(1, offset, classes)
-            blue  = get_color(0, offset, classes)
-            rgb = (red, green, blue)
-            draw.text((x1, y1), class_names[cls_id], fill=rgb)
-        draw.rectangle([x1, y1, x2, y2], outline = rgb)
-    if savename:
-        print("save plot results to %s" % savename)
-        img.save(savename)
-    return img
-
-def read_truths(lab_path):
+def read_truths(lab_path, num_keypoints=9):
+    num_labels = 2*num_keypoints+3 # +2 for width, height, +1 for class label
    if os.path.getsize(lab_path):
        truths = np.loadtxt(lab_path)
-        truths = truths.reshape(truths.size/21, 21) # to avoid single truth problem
+        truths = truths.reshape(truths.size//num_labels, num_labels) # to avoid single truth problem
        return truths
    else:
        return np.array([])

-def read_truths_args(lab_path, min_box_scale):
+def read_truths_args(lab_path, num_keypoints=9):
+    num_labels = 2 * num_keypoints + 1
    truths = read_truths(lab_path)
    new_truths = []
    for i in range(truths.shape[0]):
-        new_truths.append([truths[i][0], truths[i][1], truths[i][2], truths[i][3], truths[i][4], 
-            truths[i][5], truths[i][6], truths[i][7], truths[i][8], truths[i][9], truths[i][10], 
-            truths[i][11], truths[i][12], truths[i][13], truths[i][14], truths[i][15], truths[i][16], truths[i][17], truths[i][18]])
+        for j in range(num_labels):
+            new_truths.append(truths[i][j])
    return np.array(new_truths)

 def read_pose(lab_path):
@ -924,59 +340,9 @@ def image2torch(img):
    img = img.float().div(255.0)
    return img

-def do_detect(model, img, conf_thresh, nms_thresh, use_cuda=1):
-    model.eval()
-    t0 = time.time()
-
-    if isinstance(img, Image.Image):
-        width = img.width
-        height = img.height
-        img = torch.ByteTensor(torch.ByteStorage.from_buffer(img.tobytes()))
-        img = img.view(height, width, 3).transpose(0,1).transpose(0,2).contiguous()
-        img = img.view(1, 3, height, width)
-        img = img.float().div(255.0)
-    elif type(img) == np.ndarray: # cv2 image
-        img = torch.from_numpy(img.transpose(2,0,1)).float().div(255.0).unsqueeze(0)
-    else:
-        print("unknow image type")
-        exit(-1)
-
-    t1 = time.time()
-
-    if use_cuda:
-        img = img.cuda()
-    img = torch.autograd.Variable(img)
-    t2 = time.time()
-
-    output = model(img)
-    output = output.data
-    #for j in range(100):
-    #    sys.stdout.write('%f ' % (output.storage()[j]))
-    #print('')
-    t3 = time.time()
-
-    boxes = get_region_boxes(output, conf_thresh, model.num_classes, model.anchors, model.num_anchors)[0]
-    #for j in range(len(boxes)):
-    #    print(boxes[j])
-    t4 = time.time()
-
-    boxes = nms(boxes, nms_thresh)
-    t5 = time.time()
-
-    if False:
-        print('-----------------------------------')
-        print(' image to tensor : %f' % (t1 - t0))
-        print('  tensor to cuda : %f' % (t2 - t1))
-        print('         predict : %f' % (t3 - t2))
-        print('get_region_boxes : %f' % (t4 - t3))
-        print('             nms : %f' % (t5 - t4))
-        print('           total : %f' % (t5 - t0))
-        print('-----------------------------------')
-    return boxes
-
 def read_data_cfg(datacfg):
    options = dict()
-    options['gpus'] = '0,1,2,3'
+    options['gpus'] = '0'
    options['num_workers'] = '10'
    with open(datacfg, 'r') as fp:
        lines = fp.readlines()
@ -1008,7 +374,7 @@ def file_lines(thefilepath):
        buffer = thefile.read(8192*1024)
        if not buffer:
            break
-        count += buffer.count('\n')
+        count += buffer.count(b'\n')
    thefile.close( )
    return count

--- a/valid.ipynb
+++ b/valid.ipynb
@ -2,7 +2,7 @@
 "cells": [
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -31,68 +31,46 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
   "metadata": {},
-   "outputs": [
-    {
-     "ename": "IOError",
-     "evalue": "[Errno 2] No such file or directory: 'LINEMOD/ape/ape.ply'",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mIOError\u001b[0m                                   Traceback (most recent call last)",
-      "\u001b[0;32m<ipython-input-7-cd23ddeac3d5>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m    272\u001b[0m \u001b[0mcfgfile\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'cfg/yolo-pose.cfg'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    273\u001b[0m \u001b[0mweightfile\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'backup/ape/model_backup.weights'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 274\u001b[0;31m \u001b[0mvalid\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdatacfg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcfgfile\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mweightfile\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
-      "\u001b[0;32m<ipython-input-7-cd23ddeac3d5>\u001b[0m in \u001b[0;36mvalid\u001b[0;34m(datacfg, cfgfile, weightfile)\u001b[0m\n\u001b[1;32m     61\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     62\u001b[0m     \u001b[0;31m# Read object model information, get 3D bounding box corners\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 63\u001b[0;31m     \u001b[0mmesh\u001b[0m          \u001b[0;34m=\u001b[0m \u001b[0mMeshPly\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmeshname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     64\u001b[0m     \u001b[0mvertices\u001b[0m      \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mc_\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmesh\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvertices\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mones\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmesh\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvertices\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtranspose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     65\u001b[0m     \u001b[0mcorners3D\u001b[0m     \u001b[0;34m=\u001b[0m \u001b[0mget_3D_corners\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvertices\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;32m/cvlabdata1/home/btekin/ope/singleshotpose_release/MeshPly.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, filename, color)\u001b[0m\n\u001b[1;32m      4\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0m__init__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfilename\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcolor\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0.\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0.\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0.\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m         \u001b[0mf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilename\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'r'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      7\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvertices\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      8\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolors\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;31mIOError\u001b[0m: [Errno 2] No such file or directory: 'LINEMOD/ape/ape.ply'"
-     ]
-    }
-   ],
+   "outputs": [],
   "source": [
-    "def valid(datacfg, cfgfile, weightfile):\n",
-    "    def truths_length(truths):\n",
-    "        for i in range(50):\n",
+    "def valid(datacfg, modelcfg, weightfile):\n",
+    "    def truths_length(truths, max_num_gt=50):\n",
+    "        for i in range(max_num_gt):\n",
    "            if truths[i][1] == 0:\n",
    "                return i\n",
    "\n",
    "    # Parse configuration files\n",
-    "    options      = read_data_cfg(datacfg)\n",
-    "    valid_images = options['valid']\n",
-    "    meshname     = options['mesh']\n",
-    "    backupdir    = options['backup']\n",
-    "    name         = options['name']\n",
+    "    data_options = read_data_cfg(datacfg)\n",
+    "    valid_images = data_options['valid']\n",
+    "    meshname     = data_options['mesh']\n",
+    "    backupdir    = data_options['backup']\n",
+    "    name         = data_options['name']\n",
+    "    gpus         = data_options['gpus'] \n",
+    "    fx           = float(data_options['fx'])\n",
+    "    fy           = float(data_options['fy'])\n",
+    "    u0           = float(data_options['u0'])\n",
+    "    v0           = float(data_options['v0'])\n",
+    "    im_width     = int(data_options['width'])\n",
+    "    im_height    = int(data_options['height'])\n",
    "    if not os.path.exists(backupdir):\n",
    "        makedirs(backupdir)\n",
    "\n",
    "    # Parameters\n",
-    "    prefix       = 'results'\n",
-    "    seed         = int(time.time())\n",
-    "    gpus         = '0'     # Specify which gpus to use\n",
-    "    test_width   = 544\n",
-    "    test_height  = 544\n",
-    "    torch.manual_seed(seed)\n",
-    "    use_cuda = True\n",
-    "    if use_cuda:\n",
-    "        os.environ['CUDA_VISIBLE_DEVICES'] = gpus\n",
-    "        torch.cuda.manual_seed(seed)\n",
+    "    seed = int(time.time())\n",
+    "    os.environ['CUDA_VISIBLE_DEVICES'] = gpus\n",
+    "    torch.cuda.manual_seed(seed)\n",
    "    save            = False\n",
    "    visualize       = True\n",
    "    testtime        = True\n",
-    "    use_cuda        = True\n",
    "    num_classes     = 1\n",
    "    testing_samples = 0.0\n",
-    "    eps             = 1e-5\n",
-    "    notpredicted    = 0 \n",
-    "    conf_thresh     = 0.1\n",
-    "    nms_thresh      = 0.4\n",
-    "    match_thresh    = 0.5\n",
    "    edges_corners = [[0, 1], [0, 2], [0, 4], [1, 3], [1, 5], [2, 3], [2, 6], [3, 7], [4, 5], [4, 6], [5, 7], [6, 7]]\n",
-    "\n",
    "    if save:\n",
    "        makedirs(backupdir + '/test')\n",
    "        makedirs(backupdir + '/test/gt')\n",
    "        makedirs(backupdir + '/test/pr')\n",
-    "\n",
    "    # To save\n",
    "    testing_error_trans = 0.0\n",
    "    testing_error_angle = 0.0\n",
@ -108,17 +86,18 @@
    "    gts_trans           = []\n",
    "    gts_rot             = []\n",
    "    gts_corners2D       = []\n",
-    "    ious                = []\n",
    "\n",
    "    # Read object model information, get 3D bounding box corners\n",
-    "    mesh          = MeshPly(meshname)\n",
-    "    vertices      = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()\n",
-    "    corners3D     = get_3D_corners(vertices)\n",
-    "    # diam          = calc_pts_diameter(np.array(mesh.vertices))\n",
-    "    diam          = float(options['diam'])\n",
-    "\n",
+    "    mesh      = MeshPly(meshname)\n",
+    "    vertices  = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()\n",
+    "    corners3D = get_3D_corners(vertices)\n",
+    "    try:\n",
+    "        diam  = float(options['diam'])\n",
+    "    except:\n",
+    "        diam  = calc_pts_diameter(np.array(mesh.vertices))\n",
+    "        \n",
    "    # Read intrinsic camera parameters\n",
-    "    internal_calibration = get_camera_intrinsic()\n",
+    "    intrinsic_calibration = get_camera_intrinsic(u0, v0, fx, fy)\n",
    "\n",
    "    # Get validation file names\n",
    "    with open(valid_images) as fp:\n",
@ -126,29 +105,30 @@
    "        valid_files = [item.rstrip() for item in tmp_files]\n",
    "    \n",
    "    # Specicy model, load pretrained weights, pass to GPU and set the module in evaluation mode\n",
-    "    model = Darknet(cfgfile)\n",
+    "    model = Darknet(modelcfg)\n",
    "    model.print_network()\n",
    "    model.load_weights(weightfile)\n",
    "    model.cuda()\n",
    "    model.eval()\n",
+    "    test_width    = model.test_width\n",
+    "    test_height   = model.test_height\n",
+    "    num_keypoints = model.num_keypoints \n",
+    "    num_labels    = num_keypoints * 2 + 3\n",
    "\n",
    "    # Get the parser for the test dataset\n",
-    "    valid_dataset = dataset.listDataset(valid_images, shape=(test_width, test_height),\n",
-    "                       shuffle=False,\n",
-    "                       transform=transforms.Compose([\n",
-    "                           transforms.ToTensor(),]))\n",
-    "    valid_batchsize = 1\n",
+    "    valid_dataset = dataset.listDataset(valid_images, \n",
+    "                                        shape=(test_width, test_height),\n",
+    "                                        shuffle=False,\n",
+    "                                        transform=transforms.Compose([transforms.ToTensor(),]))\n",
    "\n",
    "    # Specify the number of workers for multiple processing, get the dataloader for the test dataset\n",
    "    kwargs = {'num_workers': 4, 'pin_memory': True}\n",
-    "    test_loader = torch.utils.data.DataLoader(\n",
-    "        valid_dataset, batch_size=valid_batchsize, shuffle=False, **kwargs) \n",
+    "    test_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=1, shuffle=False, **kwargs) \n",
    "\n",
    "    logging(\"   Testing {}...\".format(name))\n",
    "    logging(\"   Number of test samples: %d\" % len(test_loader.dataset))\n",
    "    # Iterate through test batches (Batch size for test data is 1)\n",
    "    count = 0\n",
-    "    z = np.zeros((3, 1))\n",
    "    for batch_idx, (data, target) in enumerate(test_loader):\n",
    "        \n",
    "        # Images\n",
@ -158,55 +138,39 @@
    "        \n",
    "        t1 = time.time()\n",
    "        # Pass data to GPU\n",
-    "        if use_cuda:\n",
-    "            data = data.cuda()\n",
-    "            target = target.cuda()\n",
-    "        \n",
+    "        data = data.cuda()\n",
+    "        target = target.cuda()\n",
    "        # Wrap tensors in Variable class, set volatile=True for inference mode and to use minimal memory during inference\n",
    "        data = Variable(data, volatile=True)\n",
    "        t2 = time.time()\n",
-    "        \n",
    "        # Forward pass\n",
    "        output = model(data).data  \n",
    "        t3 = time.time()\n",
-    "        \n",
    "        # Using confidence threshold, eliminate low-confidence predictions\n",
-    "        all_boxes = get_region_boxes(output, conf_thresh, num_classes)        \n",
+    "        all_boxes = get_region_boxes(output, num_classes, num_keypoints)        \n",
    "        t4 = time.time()\n",
-    "\n",
-    "        # Iterate through all images in the batch\n",
-    "        for i in range(output.size(0)):\n",
-    "        \n",
-    "            # For each image, get all the predictions\n",
-    "            boxes   = all_boxes[i]\n",
-    "        \n",
+    "        # Evaluation\n",
+    "        # Iterate through all batch elements\n",
+    "        for box_pr, target in zip([all_boxes], [target[0]]):\n",
    "            # For each image, get all the targets (for multiple object pose estimation, there might be more than 1 target per image)\n",
-    "            truths  = target[i].view(-1, 21)\n",
-    "        \n",
-    "            # Get how many object are present in the scene\n",
-    "            num_gts = truths_length(truths)\n",
-    "\n",
-    "             # Iterate through each ground-truth object\n",
+    "            truths = target.view(-1, num_keypoints*2+3)\n",
+    "            # Get how many objects are present in the scene\n",
+    "            num_gts    = truths_length(truths)\n",
+    "            # Iterate through each ground-truth object\n",
    "            for k in range(num_gts):\n",
-    "                box_gt        = [truths[k][1], truths[k][2], truths[k][3], truths[k][4], truths[k][5], truths[k][6], \n",
-    "                                truths[k][7], truths[k][8], truths[k][9], truths[k][10], truths[k][11], truths[k][12], \n",
-    "                                truths[k][13], truths[k][14], truths[k][15], truths[k][16], truths[k][17], truths[k][18], 1.0, 1.0, truths[k][0]]\n",
-    "                best_conf_est = -1\n",
-    "\n",
-    "                # If the prediction has the highest confidence, choose it as our prediction for single object pose estimation\n",
-    "                for j in range(len(boxes)):\n",
-    "                    if (boxes[j][18] > best_conf_est):\n",
-    "                        match         = corner_confidence9(box_gt[:18], torch.FloatTensor(boxes[j][:18]))\n",
-    "                        box_pr        = boxes[j]\n",
-    "                        best_conf_est = boxes[j][18]\n",
+    "                box_gt = list()\n",
+    "                for j in range(1, 2*num_keypoints+1):\n",
+    "                    box_gt.append(truths[k][j])\n",
+    "                box_gt.extend([1.0, 1.0])\n",
+    "                box_gt.append(truths[k][0])\n",
    "\n",
    "                # Denormalize the corner predictions \n",
    "                corners2D_gt = np.array(np.reshape(box_gt[:18], [9, 2]), dtype='float32')\n",
    "                corners2D_pr = np.array(np.reshape(box_pr[:18], [9, 2]), dtype='float32')\n",
-    "                corners2D_gt[:, 0] = corners2D_gt[:, 0] * 640\n",
-    "                corners2D_gt[:, 1] = corners2D_gt[:, 1] * 480               \n",
-    "                corners2D_pr[:, 0] = corners2D_pr[:, 0] * 640\n",
-    "                corners2D_pr[:, 1] = corners2D_pr[:, 1] * 480\n",
+    "                corners2D_gt[:, 0] = corners2D_gt[:, 0] * im_width\n",
+    "                corners2D_gt[:, 1] = corners2D_gt[:, 1] * im_height          \n",
+    "                corners2D_pr[:, 0] = corners2D_pr[:, 0] * im_width\n",
+    "                corners2D_pr[:, 1] = corners2D_pr[:, 1] * im_height\n",
    "                preds_corners2D.append(corners2D_pr)\n",
    "                gts_corners2D.append(corners2D_gt)\n",
    "\n",
@ -216,21 +180,8 @@
    "                errs_corner2D.append(corner_dist)\n",
    "                \n",
    "                # Compute [R|t] by pnp\n",
-    "                R_gt, t_gt = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_gt, np.array(internal_calibration, dtype='float32'))\n",
-    "                R_pr, t_pr = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_pr, np.array(internal_calibration, dtype='float32'))\n",
-    "\n",
-    "                if save:\n",
-    "                    preds_trans.append(t_pr)\n",
-    "                    gts_trans.append(t_gt)\n",
-    "                    preds_rot.append(R_pr)\n",
-    "                    gts_rot.append(R_gt)\n",
-    "\n",
-    "                    np.savetxt(backupdir + '/test/gt/R_' + valid_files[count][-8:-3] + 'txt', np.array(R_gt, dtype='float32'))\n",
-    "                    np.savetxt(backupdir + '/test/gt/t_' + valid_files[count][-8:-3] + 'txt', np.array(R_pr, dtype='float32'))\n",
-    "                    np.savetxt(backupdir + '/test/pr/R_' + valid_files[count][-8:-3] + 'txt', np.array(t_gt, dtype='float32'))\n",
-    "                    np.savetxt(backupdir + '/test/pr/t_' + valid_files[count][-8:-3] + 'txt', np.array(t_pr, dtype='float32'))\n",
-    "                    np.savetxt(backupdir + '/test/gt/corners_' + valid_files[count][-8:-3] + 'txt', np.array(corners2D_gt, dtype='float32'))\n",
-    "                    np.savetxt(backupdir + '/test/pr/corners_' + valid_files[count][-8:-3] + 'txt', np.array(corners2D_pr, dtype='float32'))\n",
+    "                R_gt, t_gt = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_gt, np.array(intrinsic_calibration, dtype='float32'))\n",
+    "                R_pr, t_pr = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_pr, np.array(intrinsic_calibration, dtype='float32'))\n",
    "                \n",
    "                # Compute translation error\n",
    "                trans_dist   = np.sqrt(np.sum(np.square(t_gt - t_pr)))\n",
@ -243,19 +194,19 @@
    "                # Compute pixel error\n",
    "                Rt_gt        = np.concatenate((R_gt, t_gt), axis=1)\n",
    "                Rt_pr        = np.concatenate((R_pr, t_pr), axis=1)\n",
-    "                proj_2d_gt   = compute_projection(vertices, Rt_gt, internal_calibration)\n",
-    "                proj_2d_pred = compute_projection(vertices, Rt_pr, internal_calibration)  \n",
-    "                proj_corners_gt = np.transpose(compute_projection(corners3D, Rt_gt, internal_calibration)) \n",
-    "                proj_corners_pr = np.transpose(compute_projection(corners3D, Rt_pr, internal_calibration)) \n",
+    "                proj_2d_gt   = compute_projection(vertices, Rt_gt, intrinsic_calibration)\n",
+    "                proj_2d_pred = compute_projection(vertices, Rt_pr, intrinsic_calibration) \n",
+    "                proj_corners_gt = np.transpose(compute_projection(corners3D, Rt_gt, intrinsic_calibration)) \n",
+    "                proj_corners_pr = np.transpose(compute_projection(corners3D, Rt_pr, intrinsic_calibration)) \n",
    "                norm         = np.linalg.norm(proj_2d_gt - proj_2d_pred, axis=0)\n",
    "                pixel_dist   = np.mean(norm)\n",
    "                errs_2d.append(pixel_dist)\n",
    "\n",
    "                if visualize:\n",
    "                    # Visualize\n",
-    "                    plt.xlim((0, 640))\n",
-    "                    plt.ylim((0, 480))\n",
-    "                    plt.imshow(scipy.misc.imresize(img, (480, 640)))\n",
+    "                    plt.xlim((0, im_width))\n",
+    "                    plt.ylim((0, im_height))\n",
+    "                    plt.imshow(scipy.misc.imresize(img, (im_height, im_width)))\n",
    "                    # Projections\n",
    "                    for edge in edges_corners:\n",
    "                        plt.plot(proj_corners_gt[edge, 0], proj_corners_gt[edge, 1], color='g', linewidth=3.0)\n",
@ -263,12 +214,6 @@
    "                    plt.gca().invert_yaxis()\n",
    "                    plt.show()\n",
    "                \n",
-    "                # Compute IoU score\n",
-    "                bb_gt        = compute_2d_bb_from_orig_pix(proj_2d_gt, output.size(3))\n",
-    "                bb_pred      = compute_2d_bb_from_orig_pix(proj_2d_pred, output.size(3))\n",
-    "                iou          = bbox_iou(bb_gt, bb_pred)\n",
-    "                ious.append(iou)\n",
-    "\n",
    "                # Compute 3D distances\n",
    "                transform_3d_gt   = compute_transformation(vertices, Rt_gt) \n",
    "                transform_3d_pred = compute_transformation(vertices, Rt_pr)  \n",
@ -283,33 +228,46 @@
    "                testing_samples      += 1\n",
    "                count = count + 1\n",
    "\n",
+    "                if save:\n",
+    "                    preds_trans.append(t_pr)\n",
+    "                    gts_trans.append(t_gt)\n",
+    "                    preds_rot.append(R_pr)\n",
+    "                    gts_rot.append(R_gt)\n",
+    "\n",
+    "                    np.savetxt(backupdir + '/test/gt/R_' + valid_files[count][-8:-3] + 'txt', np.array(R_gt, dtype='float32'))\n",
+    "                    np.savetxt(backupdir + '/test/gt/t_' + valid_files[count][-8:-3] + 'txt', np.array(t_gt, dtype='float32'))\n",
+    "                    np.savetxt(backupdir + '/test/pr/R_' + valid_files[count][-8:-3] + 'txt', np.array(R_pr, dtype='float32'))\n",
+    "                    np.savetxt(backupdir + '/test/pr/t_' + valid_files[count][-8:-3] + 'txt', np.array(t_pr, dtype='float32'))\n",
+    "                    np.savetxt(backupdir + '/test/gt/corners_' + valid_files[count][-8:-3] + 'txt', np.array(corners2D_gt, dtype='float32'))\n",
+    "                    np.savetxt(backupdir + '/test/pr/corners_' + valid_files[count][-8:-3] + 'txt', np.array(corners2D_pr, dtype='float32'))\n",
+    "\n",
+    "\n",
    "        t5 = time.time()\n",
    "\n",
    "    # Compute 2D projection error, 6D pose error, 5cm5degree error\n",
-    "    px_threshold = 5\n",
-    "    acc         = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)\n",
-    "    acciou      = len(np.where(np.array(errs_2d) >= 0.5)[0]) * 100. / (len(ious)+eps)\n",
-    "    acc5cm5deg  = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)\n",
-    "    acc3d10     = len(np.where(np.array(errs_3d) <= diam * 0.1)[0]) * 100. / (len(errs_3d)+eps)\n",
-    "    acc5cm5deg  = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)\n",
-    "    corner_acc  = len(np.where(np.array(errs_corner2D) <= px_threshold)[0]) * 100. / (len(errs_corner2D)+eps)\n",
-    "    mean_err_2d = np.mean(errs_2d)\n",
+    "    px_threshold = 5 # 5 pixel threshold for 2D reprojection error is standard in recent sota 6D object pose estimation works \n",
+    "    eps          = 1e-5\n",
+    "    acc          = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)\n",
+    "    acc5cm5deg   = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)\n",
+    "    acc3d10      = len(np.where(np.array(errs_3d) <= diam * 0.1)[0]) * 100. / (len(errs_3d)+eps)\n",
+    "    acc5cm5deg   = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)\n",
+    "    corner_acc   = len(np.where(np.array(errs_corner2D) <= px_threshold)[0]) * 100. / (len(errs_corner2D)+eps)\n",
+    "    mean_err_2d  = np.mean(errs_2d)\n",
    "    mean_corner_err_2d = np.mean(errs_corner2D)\n",
    "    nts = float(testing_samples)\n",
    "\n",
    "    if testtime:\n",
    "        print('-----------------------------------')\n",
    "        print('  tensor to cuda : %f' % (t2 - t1))\n",
-    "        print('         predict : %f' % (t3 - t2))\n",
+    "        print('    forward pass : %f' % (t3 - t2))\n",
    "        print('get_region_boxes : %f' % (t4 - t3))\n",
-    "        print('             nms : %f' % (t5 - t4))\n",
-    "        print('           total : %f' % (t5 - t1))\n",
+    "        print(' prediction time : %f' % (t4 - t1))\n",
+    "        print('            eval : %f' % (t5 - t4))\n",
    "        print('-----------------------------------')\n",
    "\n",
    "    # Print test statistics\n",
    "    logging('Results of {}'.format(name))\n",
    "    logging('   Acc using {} px 2D Projection = {:.2f}%'.format(px_threshold, acc))\n",
-    "    logging('   Acc using the IoU metric = {:.6f}%'.format(acciou))\n",
    "    logging('   Acc using 10% threshold - {} vx 3D Transformation = {:.2f}%'.format(diam * 0.1, acc3d10))\n",
    "    logging('   Acc using 5 cm 5 degree metric = {:.2f}%'.format(acc5cm5deg))\n",
    "    logging(\"   Mean 2D pixel error is %f, Mean vertex error is %f, mean corner error is %f\" % (mean_err_2d, np.mean(errs_3d), mean_corner_err_2d))\n",
@ -319,10 +277,11 @@
    "        predfile = backupdir + '/predictions_linemod_' + name +  '.mat'\n",
    "        scipy.io.savemat(predfile, {'R_gts': gts_rot, 't_gts':gts_trans, 'corner_gts': gts_corners2D, 'R_prs': preds_rot, 't_prs':preds_trans, 'corner_prs': preds_corners2D})\n",
    "\n",
-    "datacfg = 'cfg/ape.data'\n",
-    "cfgfile = 'cfg/yolo-pose.cfg'\n",
+    "datacfg    = 'cfg/ape.data'\n",
+    "modelcfg   = 'cfg/yolo-pose.cfg'\n",
    "weightfile = 'backup/ape/model_backup.weights'\n",
-    "valid(datacfg, cfgfile, weightfile)"
+    "valid(datacfg, modelcfg, weightfile)\n",
+    "        "
   ]
  },
  {
@ -335,21 +294,21 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 2",
+   "display_name": "Python 3",
   "language": "python",
-   "name": "python2"
+   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
-    "version": 2
+    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython2",
-   "version": "2.7.12"
+   "pygments_lexer": "ipython3",
+   "version": "3.6.6"
  }
 },
 "nbformat": 4,
--- a/valid.py
+++ b/valid.py
@ -1,63 +1,51 @@
 import os
 import time
 import torch
-from torch.autograd import Variable
-from torchvision import datasets, transforms
+import argparse
 import scipy.io
 import warnings
-warnings.filterwarnings("ignore")
+from torch.autograd import Variable
+from torchvision import datasets, transforms

-from darknet import Darknet
 import dataset
+from darknet import Darknet
 from utils import *
 from MeshPly import MeshPly

-# Create new directory
-def makedirs(path):
-    if not os.path.exists( path ):
-        os.makedirs( path )
-
-def valid(datacfg, cfgfile, weightfile, outfile):
-    def truths_length(truths):
-        for i in range(50):
+def valid(datacfg, modelcfg, weightfile):
+    def truths_length(truths, max_num_gt=50):
+        for i in range(max_num_gt):
            if truths[i][1] == 0:
                return i

    # Parse configuration files
-    options      = read_data_cfg(datacfg)
-    valid_images = options['valid']
-    meshname     = options['mesh']
-    backupdir    = options['backup']
-    name         = options['name']
+    data_options = read_data_cfg(datacfg)
+    valid_images = data_options['valid']
+    meshname     = data_options['mesh']
+    backupdir    = data_options['backup']
+    name         = data_options['name']
+    gpus         = data_options['gpus'] 
+    fx           = float(data_options['fx'])
+    fy           = float(data_options['fy'])
+    u0           = float(data_options['u0'])
+    v0           = float(data_options['v0'])
+    im_width     = int(data_options['width'])
+    im_height    = int(data_options['height'])
    if not os.path.exists(backupdir):
        makedirs(backupdir)

    # Parameters
-    prefix       = 'results'
-    seed         = int(time.time())
-    gpus         = '0'     # Specify which gpus to use
-    test_width   = 544
-    test_height  = 544
-    torch.manual_seed(seed)
-    use_cuda = True
-    if use_cuda:
-        os.environ['CUDA_VISIBLE_DEVICES'] = gpus
-        torch.cuda.manual_seed(seed)
+    seed = int(time.time())
+    os.environ['CUDA_VISIBLE_DEVICES'] = gpus
+    torch.cuda.manual_seed(seed)
    save            = False
    testtime        = True
-    use_cuda        = True
    num_classes     = 1
    testing_samples = 0.0
-    eps             = 1e-5
-    notpredicted    = 0 
-    conf_thresh     = 0.1
-    nms_thresh      = 0.4
-    match_thresh    = 0.5
    if save:
        makedirs(backupdir + '/test')
        makedirs(backupdir + '/test/gt')
        makedirs(backupdir + '/test/pr')
-
    # To save
    testing_error_trans = 0.0
    testing_error_angle = 0.0
@ -75,14 +63,16 @@ def valid(datacfg, cfgfile, weightfile, outfile):
    gts_corners2D       = []

    # Read object model information, get 3D bounding box corners
-    mesh          = MeshPly(meshname)
-    vertices      = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()
-    corners3D     = get_3D_corners(vertices)
-    # diam          = calc_pts_diameter(np.array(mesh.vertices))
-    diam          = float(options['diam'])
-
+    mesh      = MeshPly(meshname)
+    vertices  = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()
+    corners3D = get_3D_corners(vertices)
+    try:
+        diam  = float(options['diam'])
+    except:
+        diam  = calc_pts_diameter(np.array(mesh.vertices))
+        
    # Read intrinsic camera parameters
-    internal_calibration = get_camera_intrinsic()
+    intrinsic_calibration = get_camera_intrinsic(u0, v0, fx, fy)

    # Get validation file names
    with open(valid_images) as fp:
@ -90,82 +80,66 @@ def valid(datacfg, cfgfile, weightfile, outfile):
        valid_files = [item.rstrip() for item in tmp_files]
    
    # Specicy model, load pretrained weights, pass to GPU and set the module in evaluation mode
-    model = Darknet(cfgfile)
+    model = Darknet(modelcfg)
    model.print_network()
    model.load_weights(weightfile)
    model.cuda()
    model.eval()
+    test_width    = model.test_width
+    test_height   = model.test_height
+    num_keypoints = model.num_keypoints 
+    num_labels    = num_keypoints * 2 + 3

    # Get the parser for the test dataset
-    valid_dataset = dataset.listDataset(valid_images, shape=(test_width, test_height),
-                       shuffle=False,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),]))
-    valid_batchsize = 1
+    valid_dataset = dataset.listDataset(valid_images, 
+                                        shape=(test_width, test_height),
+                                        shuffle=False,
+                                        transform=transforms.Compose([transforms.ToTensor(),]))

    # Specify the number of workers for multiple processing, get the dataloader for the test dataset
    kwargs = {'num_workers': 4, 'pin_memory': True}
-    test_loader = torch.utils.data.DataLoader(
-        valid_dataset, batch_size=valid_batchsize, shuffle=False, **kwargs) 
+    test_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=1, shuffle=False, **kwargs) 

    logging("   Testing {}...".format(name))
    logging("   Number of test samples: %d" % len(test_loader.dataset))
    # Iterate through test batches (Batch size for test data is 1)
    count = 0
-    z = np.zeros((3, 1))
    for batch_idx, (data, target) in enumerate(test_loader):
-        
        t1 = time.time()
        # Pass data to GPU
-        if use_cuda:
-            data = data.cuda()
-            target = target.cuda()
-        
+        data = data.cuda()
+        target = target.cuda()
        # Wrap tensors in Variable class, set volatile=True for inference mode and to use minimal memory during inference
        data = Variable(data, volatile=True)
        t2 = time.time()
-        
        # Forward pass
        output = model(data).data  
        t3 = time.time()
-        
        # Using confidence threshold, eliminate low-confidence predictions
-        all_boxes = get_region_boxes(output, conf_thresh, num_classes)        
+        all_boxes = get_region_boxes(output, num_classes, num_keypoints)        
        t4 = time.time()
-
-        # Iterate through all images in the batch
-        for i in range(output.size(0)):
-        
-            # For each image, get all the predictions
-            boxes   = all_boxes[i]
-        
+        # Evaluation
+        # Iterate through all batch elements
+        for box_pr, target in zip([all_boxes], [target[0]]):
            # For each image, get all the targets (for multiple object pose estimation, there might be more than 1 target per image)
-            truths  = target[i].view(-1, 21)
-        
-            # Get how many object are present in the scene
-            num_gts = truths_length(truths)
-
-             # Iterate through each ground-truth object
+            truths = target.view(-1, num_keypoints*2+3)
+            # Get how many objects are present in the scene
+            num_gts    = truths_length(truths)
+            # Iterate through each ground-truth object
            for k in range(num_gts):
-                box_gt        = [truths[k][1], truths[k][2], truths[k][3], truths[k][4], truths[k][5], truths[k][6], 
-                                truths[k][7], truths[k][8], truths[k][9], truths[k][10], truths[k][11], truths[k][12], 
-                                truths[k][13], truths[k][14], truths[k][15], truths[k][16], truths[k][17], truths[k][18], 1.0, 1.0, truths[k][0]]
-                best_conf_est = -1
-
-                # If the prediction has the highest confidence, choose it as our prediction for single object pose estimation
-                for j in range(len(boxes)):
-                    if (boxes[j][18] > best_conf_est):
-                        match         = corner_confidence9(box_gt[:18], torch.FloatTensor(boxes[j][:18]))
-                        box_pr        = boxes[j]
-                        best_conf_est = boxes[j][18]
+                box_gt = list()
+                for j in range(1, 2*num_keypoints+1):
+                    box_gt.append(truths[k][j])
+                box_gt.extend([1.0, 1.0])
+                box_gt.append(truths[k][0])

                # Denormalize the corner predictions 
                corners2D_gt = np.array(np.reshape(box_gt[:18], [9, 2]), dtype='float32')
                corners2D_pr = np.array(np.reshape(box_pr[:18], [9, 2]), dtype='float32')
-                corners2D_gt[:, 0] = corners2D_gt[:, 0] * 640
-                corners2D_gt[:, 1] = corners2D_gt[:, 1] * 480               
-                corners2D_pr[:, 0] = corners2D_pr[:, 0] * 640
-                corners2D_pr[:, 1] = corners2D_pr[:, 1] * 480
+                corners2D_gt[:, 0] = corners2D_gt[:, 0] * im_width
+                corners2D_gt[:, 1] = corners2D_gt[:, 1] * im_height          
+                corners2D_pr[:, 0] = corners2D_pr[:, 0] * im_width
+                corners2D_pr[:, 1] = corners2D_pr[:, 1] * im_height
                preds_corners2D.append(corners2D_pr)
                gts_corners2D.append(corners2D_gt)

@ -175,21 +149,8 @@ def valid(datacfg, cfgfile, weightfile, outfile):
                errs_corner2D.append(corner_dist)
                
                # Compute [R|t] by pnp
-                R_gt, t_gt = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_gt, np.array(internal_calibration, dtype='float32'))
-                R_pr, t_pr = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_pr, np.array(internal_calibration, dtype='float32'))
-
-                if save:
-                    preds_trans.append(t_pr)
-                    gts_trans.append(t_gt)
-                    preds_rot.append(R_pr)
-                    gts_rot.append(R_gt)
-
-                    np.savetxt(backupdir + '/test/gt/R_' + valid_files[count][-8:-3] + 'txt', np.array(R_gt, dtype='float32'))
-                    np.savetxt(backupdir + '/test/gt/t_' + valid_files[count][-8:-3] + 'txt', np.array(t_gt, dtype='float32'))
-                    np.savetxt(backupdir + '/test/pr/R_' + valid_files[count][-8:-3] + 'txt', np.array(R_pr, dtype='float32'))
-                    np.savetxt(backupdir + '/test/pr/t_' + valid_files[count][-8:-3] + 'txt', np.array(t_pr, dtype='float32'))
-                    np.savetxt(backupdir + '/test/gt/corners_' + valid_files[count][-8:-3] + 'txt', np.array(corners2D_gt, dtype='float32'))
-                    np.savetxt(backupdir + '/test/pr/corners_' + valid_files[count][-8:-3] + 'txt', np.array(corners2D_pr, dtype='float32'))
+                R_gt, t_gt = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_gt, np.array(intrinsic_calibration, dtype='float32'))
+                R_pr, t_pr = pnp(np.array(np.transpose(np.concatenate((np.zeros((3, 1)), corners3D[:3, :]), axis=1)), dtype='float32'),  corners2D_pr, np.array(intrinsic_calibration, dtype='float32'))
                
                # Compute translation error
                trans_dist   = np.sqrt(np.sum(np.square(t_gt - t_pr)))
@ -202,8 +163,8 @@ def valid(datacfg, cfgfile, weightfile, outfile):
                # Compute pixel error
                Rt_gt        = np.concatenate((R_gt, t_gt), axis=1)
                Rt_pr        = np.concatenate((R_pr, t_pr), axis=1)
-                proj_2d_gt   = compute_projection(vertices, Rt_gt, internal_calibration)
-                proj_2d_pred = compute_projection(vertices, Rt_pr, internal_calibration) 
+                proj_2d_gt   = compute_projection(vertices, Rt_gt, intrinsic_calibration)
+                proj_2d_pred = compute_projection(vertices, Rt_pr, intrinsic_calibration) 
                norm         = np.linalg.norm(proj_2d_gt - proj_2d_pred, axis=0)
                pixel_dist   = np.mean(norm)
                errs_2d.append(pixel_dist)
@ -222,26 +183,41 @@ def valid(datacfg, cfgfile, weightfile, outfile):
                testing_samples      += 1
                count = count + 1

+                if save:
+                    preds_trans.append(t_pr)
+                    gts_trans.append(t_gt)
+                    preds_rot.append(R_pr)
+                    gts_rot.append(R_gt)
+
+                    np.savetxt(backupdir + '/test/gt/R_' + valid_files[count][-8:-3] + 'txt', np.array(R_gt, dtype='float32'))
+                    np.savetxt(backupdir + '/test/gt/t_' + valid_files[count][-8:-3] + 'txt', np.array(t_gt, dtype='float32'))
+                    np.savetxt(backupdir + '/test/pr/R_' + valid_files[count][-8:-3] + 'txt', np.array(R_pr, dtype='float32'))
+                    np.savetxt(backupdir + '/test/pr/t_' + valid_files[count][-8:-3] + 'txt', np.array(t_pr, dtype='float32'))
+                    np.savetxt(backupdir + '/test/gt/corners_' + valid_files[count][-8:-3] + 'txt', np.array(corners2D_gt, dtype='float32'))
+                    np.savetxt(backupdir + '/test/pr/corners_' + valid_files[count][-8:-3] + 'txt', np.array(corners2D_pr, dtype='float32'))
+
+
        t5 = time.time()

    # Compute 2D projection error, 6D pose error, 5cm5degree error
-    px_threshold = 5
-    acc         = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)
-    acc5cm5deg  = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)
-    acc3d10     = len(np.where(np.array(errs_3d) <= diam * 0.1)[0]) * 100. / (len(errs_3d)+eps)
-    acc5cm5deg  = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)
-    corner_acc  = len(np.where(np.array(errs_corner2D) <= px_threshold)[0]) * 100. / (len(errs_corner2D)+eps)
-    mean_err_2d = np.mean(errs_2d)
+    px_threshold = 5 # 5 pixel threshold for 2D reprojection error is standard in recent sota 6D object pose estimation works 
+    eps          = 1e-5
+    acc          = len(np.where(np.array(errs_2d) <= px_threshold)[0]) * 100. / (len(errs_2d)+eps)
+    acc5cm5deg   = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)
+    acc3d10      = len(np.where(np.array(errs_3d) <= diam * 0.1)[0]) * 100. / (len(errs_3d)+eps)
+    acc5cm5deg   = len(np.where((np.array(errs_trans) <= 0.05) & (np.array(errs_angle) <= 5))[0]) * 100. / (len(errs_trans)+eps)
+    corner_acc   = len(np.where(np.array(errs_corner2D) <= px_threshold)[0]) * 100. / (len(errs_corner2D)+eps)
+    mean_err_2d  = np.mean(errs_2d)
    mean_corner_err_2d = np.mean(errs_corner2D)
    nts = float(testing_samples)

    if testtime:
        print('-----------------------------------')
        print('  tensor to cuda : %f' % (t2 - t1))
-        print('         predict : %f' % (t3 - t2))
+        print('    forward pass : %f' % (t3 - t2))
        print('get_region_boxes : %f' % (t4 - t3))
+        print(' prediction time : %f' % (t4 - t1))
        print('            eval : %f' % (t5 - t4))
-        print('           total : %f' % (t5 - t1))
        print('-----------------------------------')

    # Print test statistics
@ -257,13 +233,14 @@ def valid(datacfg, cfgfile, weightfile, outfile):
        scipy.io.savemat(predfile, {'R_gts': gts_rot, 't_gts':gts_trans, 'corner_gts': gts_corners2D, 'R_prs': preds_rot, 't_prs':preds_trans, 'corner_prs': preds_corners2D})

 if __name__ == '__main__':
-    import sys
-    if len(sys.argv) == 4:
-        datacfg = sys.argv[1]
-        cfgfile = sys.argv[2]
-        weightfile = sys.argv[3]
-        outfile = 'comp4_det_test_'
-        valid(datacfg, cfgfile, weightfile, outfile)
-    else:
-        print('Usage:')
-        print(' python valid.py datacfg cfgfile weightfile')
+
+    # Parse configuration files
+    parser = argparse.ArgumentParser(description='SingleShotPose')
+    parser.add_argument('--datacfg', type=str, default='cfg/ape.data') # data config
+    parser.add_argument('--modelcfg', type=str, default='cfg/yolo-pose.cfg') # network config
+    parser.add_argument('--weightfile', type=str, default='backup/ape/model_backup.weights') # imagenet initialized weights
+    args       = parser.parse_args()
+    datacfg    = args.datacfg
+    modelcfg   = args.modelcfg
+    weightfile = args.weightfile
+    valid(datacfg, modelcfg, weightfile)