This commit is contained in:
Родитель
a362bfa1f2
Коммит
f926dffb45
|
@ -14,8 +14,6 @@ dist/
|
|||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
|
@ -102,3 +100,8 @@ venv.bak/
|
|||
|
||||
# mypy
|
||||
.mypy_cache/
|
||||
|
||||
/data
|
||||
/output
|
||||
/models
|
||||
/log
|
||||
|
|
|
@ -0,0 +1,14 @@
|
|||
|
||||
# Contributing
|
||||
|
||||
This project welcomes contributions and suggestions. Most contributions require you to agree to a
|
||||
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
|
||||
the rights to use your contribution. For details, visit https://cla.microsoft.com.
|
||||
|
||||
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
|
||||
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions
|
||||
provided by the bot. You will only need to do this once across all repos using our CLA.
|
||||
|
||||
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
|
||||
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
|
||||
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
|
194
README.md
194
README.md
|
@ -1,14 +1,188 @@
|
|||
# Simple Baselines for Pose Estimation and Pose Tracking
|
||||
|
||||
# Contributing
|
||||
## Introduction
|
||||
This is an official pytorch implementation of [*Simple Baselines for Pose Estimation and Pose Tracking*](https://arxiv.org/abs/1804.06208). This work provides baseline methods that are surprisingly simple and effective, thus helpful for inspiring and evaluating new ideas for the field. State-of-the-art results are achieved on challenging benchmarks. On COCO keypoints valid dataset, our best **single model** achieves **74.3 of mAP**. You can reproduce our results using this repo. All models are provided for research purpose. </br>
|
||||
|
||||
This project welcomes contributions and suggestions. Most contributions require you to agree to a
|
||||
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
|
||||
the rights to use your contribution. For details, visit https://cla.microsoft.com.
|
||||
## Main Results
|
||||
### Results on MPII val
|
||||
| arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1|
|
||||
|---|---|---|---|---|---|---|---|---|---|
|
||||
| 256x256_pose_resnet_50_d256d256d256 | 96.351 | 95.329 | 88.989 | 83.176 | 88.420 | 83.960 | 79.594 | 88.532 | 33.911 |
|
||||
| 384x384_pose_resnet_50_d256d256d256 | 96.658 | 95.754 | 89.790 | 84.614 | 88.523 | 84.666 | 79.287 | 89.066 | 38.046 |
|
||||
| 256x256_pose_resnet_101_d256d256d256 | 96.862 | 95.873 | 89.518 | 84.376 | 88.437 | 84.486 | 80.703 | 89.131 | 34.020 |
|
||||
| 384x384_pose_resnet_101_d256d256d256 | 96.965 | 95.907 | 90.268 | 85.780 | 89.597 | 85.935 | 82.098 | 90.003 | 38.860 |
|
||||
| 256x256_pose_resnet_152_d256d256d256 | 97.033 | 95.941 | 90.046 | 84.976 | 89.164 | 85.311 | 81.271 | 89.620 | 35.025 |
|
||||
| 384x384_pose_resnet_152_d256d256d256 | 96.794 | 95.618 | 90.080 | 86.225 | 89.700 | 86.862 | 82.853 | 90.200 | 39.433 |
|
||||
|
||||
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
|
||||
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions
|
||||
provided by the bot. You will only need to do this once across all repos using our CLA.
|
||||
### Note:
|
||||
- Flip test is used
|
||||
|
||||
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
|
||||
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
|
||||
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
|
||||
### Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
|
||||
| Arch | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) |
|
||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
||||
| 256x192_pose_resnet_50_d256d256d256 | 0.704 | 0.886 | 0.783 | 0.671 | 0.772 | 0.763 | 0.929 | 0.834 | 0.721 | 0.824 |
|
||||
| 384x288_pose_resnet_50_d256d256d256 | 0.722 | 0.893 | 0.789 | 0.681 | 0.797 | 0.776 | 0.932 | 0.838 | 0.728 | 0.846 |
|
||||
| 256x192_pose_resnet_101_d256d256d256 | 0.714 | 0.893 | 0.793 | 0.681 | 0.781 | 0.771 | 0.934 | 0.840 | 0.730 | 0.832 |
|
||||
| 384x288_pose_resnet_101_d256d256d256 | 0.736 | 0.896 | 0.803 | 0.699 | 0.811 | 0.791 | 0.936 | 0.851 | 0.745 | 0.858 |
|
||||
| 256x192_pose_resnet_152_d256d256d256 | 0.720 | 0.893 | 0.798 | 0.687 | 0.789 | 0.778 | 0.934 | 0.846 | 0.736 | 0.839 |
|
||||
| 384x288_pose_resnet_152_d256d256d256 | 0.743 | 0.896 | 0.811 | 0.705 | 0.816 | 0.797 | 0.937 | 0.858 | 0.751 | 0.863 |
|
||||
|
||||
### Note:
|
||||
- Flip test is used
|
||||
- Person detector has person AP of 56.4 on COCO val2017 dataset
|
||||
|
||||
## Environment
|
||||
The code is developed using python3.6 on Ubutnu16.04. NVIDIA GPUs ared needed. The code is developed and tested using 4 NVIDIA P100 GPUS cards. Other platform or GPU card are not fully tested.
|
||||
|
||||
## Quick start
|
||||
### Installation
|
||||
1. Install pytorch >= v0.4.0 following [official instruction](https://pytorch.org/)
|
||||
2. Disable cudnn for batch_norm
|
||||
```
|
||||
# PYTORCH=/path/to/pytorch
|
||||
# for pytorch v0.4.0
|
||||
sed -i "1194s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
|
||||
# for pytorch v0.4.1
|
||||
sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
|
||||
```
|
||||
Note that instructions like # PYTORCH=/path/to/pytorch indicate that you should pick a path where you'd like to have pytorch installed and then set an environment variable (PYTORCH in this case) accordingly.
|
||||
1. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}
|
||||
2. Install dependencies.
|
||||
```
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
3. Install [COCOAPI](https://github.com/cocodataset/cocoapi):
|
||||
```
|
||||
# COCOAPI=/path/to/clone/cocoapi
|
||||
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
|
||||
cd $COCOAPI/PythonAPI
|
||||
# Install into global site-packages
|
||||
make install
|
||||
# Alternatively, if you do not have permissions or prefer
|
||||
# not to install the COCO API into global site-packages
|
||||
python3 setup.py install --user
|
||||
```
|
||||
Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.
|
||||
3. Download pytorch imagenet pretrained models from [pytorch model zoo](https://pytorch.org/docs/stable/model_zoo.html#module-torch.utils.model_zoo).
|
||||
4. Download mpii and coco pretrained model from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blW0D5ZE4ArK9wk_fvw). Please download them under ${POSE_ROOT}/models/pytorch, and make them look like this:
|
||||
|
||||
```
|
||||
${POSE_ROOT}
|
||||
`-- models
|
||||
`-- pytorch
|
||||
|-- imagenet
|
||||
| |-- resnet50-19c8e357.pth
|
||||
| |-- resnet101-5d3b4d8f.pth
|
||||
| `-- resnet152-b121ed2d.pth
|
||||
|-- pose_coco
|
||||
| |-- pose_resnet_101_256x192.pth.tar
|
||||
| |-- pose_resnet_101_384x288.pth.tar
|
||||
| |-- pose_resnet_152_256x192.pth.tar
|
||||
| |-- pose_resnet_152_384x288.pth.tar
|
||||
| |-- pose_resnet_50_256x192.pth.tar
|
||||
| `-- pose_resnet_50_384x288.pth.tar
|
||||
`-- pose_mpii
|
||||
|-- pose_resnet_101_256x256.pth.tar
|
||||
|-- pose_resnet_101_384x384.pth.tar
|
||||
|-- pose_resnet_152_256x256.pth.tar
|
||||
|-- pose_resnet_152_384x384.pth.tar
|
||||
|-- pose_resnet_50_256x256.pth.tar
|
||||
`-- pose_resnet_50_384x384.pth.tar
|
||||
|
||||
```
|
||||
|
||||
4. Init output(training model output directory) and log(tensorboard log directory) directory.
|
||||
|
||||
```
|
||||
mkdir ouput
|
||||
mkdir log
|
||||
```
|
||||
|
||||
and your directory tree should like this
|
||||
|
||||
```
|
||||
${POSE_ROOT}
|
||||
├── data
|
||||
├── experiments
|
||||
├── lib
|
||||
├── log
|
||||
├── models
|
||||
├── output
|
||||
├── pose_estimation
|
||||
├── README.md
|
||||
└── requirements.txt
|
||||
```
|
||||
|
||||
### Data preparation
|
||||
**For MPII data**, please download from [MPII Human Pose Dataset](http://human-pose.mpi-inf.mpg.de/), the original annotation files are matlab's format. We have converted to json format, you also need download them from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blW00SqrairNetmeVu4).
|
||||
Extract them under {POSE_ROOT}/data, and make them look like this:
|
||||
```
|
||||
${POSE_ROOT}
|
||||
|-- data
|
||||
`-- |-- mpii
|
||||
`-- |-- annot
|
||||
| |-- gt_valid.mat
|
||||
| |-- test.json
|
||||
| |-- train.json
|
||||
| |-- trainval.json
|
||||
| `-- valid.json
|
||||
`-- images
|
||||
|-- 000001163.jpg
|
||||
|-- 000003072.jpg
|
||||
```
|
||||
|
||||
**For COCO data**, please download from [COCO download](http://cocodataset.org/#download), 2017 Train/Val is needed for COCO keypoints training and validation. We also provide person detection result of COCO val2017 for reproduce our multi-person pose estimation results. Please download from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzA1A-y1AH-pZQdS).
|
||||
Download and extract them under {POSE_ROOT}/data, and make them look like this:
|
||||
```
|
||||
${POSE_ROOT}
|
||||
|-- data
|
||||
`-- |-- coco
|
||||
`-- |-- annotations
|
||||
| |-- person_keypoints_train2017.json
|
||||
| `-- person_keypoints_val2017.json
|
||||
|-- person_detection_results
|
||||
| |-- COCO_val2017_detections_AP_H_56_person.json
|
||||
`-- images
|
||||
|-- train2017
|
||||
| |-- 000000000009.jpg
|
||||
| |-- 000000000025.jpg
|
||||
| |-- 000000000030.jpg
|
||||
| |-- ...
|
||||
`-- val2017
|
||||
|-- 000000000139.jpg
|
||||
|-- 000000000285.jpg
|
||||
|-- 000000000632.jpg
|
||||
|-- ...
|
||||
```
|
||||
|
||||
### Valid on MPII using pretrained models
|
||||
|
||||
```
|
||||
python pose_estimation/valid.py \
|
||||
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
|
||||
--flip-test \
|
||||
--model-file models/pytorch/pose_mpii/pose_resnet_50_256x256.pth.tar
|
||||
```
|
||||
|
||||
### Training on MPII
|
||||
|
||||
```
|
||||
python pose_estimation/train.py \
|
||||
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml
|
||||
```
|
||||
|
||||
### Valid on COCO val2017 using pretrained models
|
||||
|
||||
```
|
||||
python pose_estimation/valid.py \
|
||||
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
|
||||
--flip-test \
|
||||
--model-file models/pytorch/pose_mpii/pose_resnet_50_256x256.pth.tar
|
||||
```
|
||||
|
||||
### Training on COCO train2017
|
||||
|
||||
```
|
||||
python pose_estimation/train.py \
|
||||
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml
|
||||
```
|
||||
|
|
|
@ -0,0 +1,76 @@
|
|||
GPUS: '0'
|
||||
DATA_DIR: ''
|
||||
OUTPUT_DIR: 'output'
|
||||
LOG_DIR: 'log'
|
||||
WORKERS: 4
|
||||
PRINT_FREQ: 100
|
||||
|
||||
DATASET:
|
||||
DATASET: 'coco'
|
||||
ROOT: 'data/coco/'
|
||||
TEST_SET: 'val2017'
|
||||
TRAIN_SET: 'train2017'
|
||||
FLIP: true
|
||||
ROT_FACTOR: 40
|
||||
SCALE_FACTOR: 0.3
|
||||
MODEL:
|
||||
NAME: 'pose_resnet'
|
||||
PRETRAINED: 'models/pytorch/imagenet/resnet101-5d3b4d8f.pth'
|
||||
IMAGE_SIZE:
|
||||
- 192
|
||||
- 256
|
||||
NUM_JOINTS: 17
|
||||
EXTRA:
|
||||
TARGET_TYPE: 'gaussian'
|
||||
HEATMAP_SIZE:
|
||||
- 48
|
||||
- 64
|
||||
SIGMA: 2
|
||||
FINAL_CONV_KERNEL: 1
|
||||
DECONV_WITH_BIAS: false
|
||||
NUM_DECONV_LAYERS: 3
|
||||
NUM_DECONV_FILTERS:
|
||||
- 256
|
||||
- 256
|
||||
- 256
|
||||
NUM_DECONV_KERNELS:
|
||||
- 4
|
||||
- 4
|
||||
- 4
|
||||
NUM_LAYERS: 101
|
||||
LOSS:
|
||||
USE_TARGET_WEIGHT: true
|
||||
TRAIN:
|
||||
BATCH_SIZE: 32
|
||||
SHUFFLE: true
|
||||
BEGIN_EPOCH: 0
|
||||
END_EPOCH: 140
|
||||
RESUME: false
|
||||
OPTIMIZER: 'adam'
|
||||
LR: 0.001
|
||||
LR_FACTOR: 0.1
|
||||
LR_STEP:
|
||||
- 90
|
||||
- 120
|
||||
WD: 0.0001
|
||||
GAMMA1: 0.99
|
||||
GAMMA2: 0.0
|
||||
MOMENTUM: 0.9
|
||||
NESTEROV: false
|
||||
TEST:
|
||||
BATCH_SIZE: 32
|
||||
COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
|
||||
BBOX_THRE: 1.0
|
||||
FLIP_TEST: false
|
||||
IMAGE_THRE: 0.0
|
||||
IN_VIS_THRE: 0.2
|
||||
MODEL_FILE: ''
|
||||
NMS_THRE: 1.0
|
||||
OKS_THRE: 0.9
|
||||
USE_GT_BBOX: true
|
||||
DEBUG:
|
||||
DEBUG: true
|
||||
SAVE_BATCH_IMAGES_GT: true
|
||||
SAVE_BATCH_IMAGES_PRED: true
|
||||
SAVE_HEATMAPS_GT: true
|
||||
SAVE_HEATMAPS_PRED: true
|
|
@ -0,0 +1,76 @@
|
|||
GPUS: '0'
|
||||
DATA_DIR: ''
|
||||
OUTPUT_DIR: 'output'
|
||||
LOG_DIR: 'log'
|
||||
WORKERS: 4
|
||||
PRINT_FREQ: 100
|
||||
|
||||
DATASET:
|
||||
DATASET: 'coco'
|
||||
ROOT: 'data/coco/'
|
||||
TEST_SET: 'val2017'
|
||||
TRAIN_SET: 'train2017'
|
||||
FLIP: true
|
||||
ROT_FACTOR: 40
|
||||
SCALE_FACTOR: 0.3
|
||||
MODEL:
|
||||
NAME: 'pose_resnet'
|
||||
PRETRAINED: 'models/pytorch/imagenet/resnet101-5d3b4d8f.pth'
|
||||
IMAGE_SIZE:
|
||||
- 288
|
||||
- 384
|
||||
NUM_JOINTS: 17
|
||||
EXTRA:
|
||||
TARGET_TYPE: 'gaussian'
|
||||
HEATMAP_SIZE:
|
||||
- 72
|
||||
- 96
|
||||
SIGMA: 3
|
||||
FINAL_CONV_KERNEL: 1
|
||||
DECONV_WITH_BIAS: false
|
||||
NUM_DECONV_LAYERS: 3
|
||||
NUM_DECONV_FILTERS:
|
||||
- 256
|
||||
- 256
|
||||
- 256
|
||||
NUM_DECONV_KERNELS:
|
||||
- 4
|
||||
- 4
|
||||
- 4
|
||||
NUM_LAYERS: 101
|
||||
LOSS:
|
||||
USE_TARGET_WEIGHT: true
|
||||
TRAIN:
|
||||
BATCH_SIZE: 32
|
||||
SHUFFLE: true
|
||||
BEGIN_EPOCH: 0
|
||||
END_EPOCH: 140
|
||||
RESUME: false
|
||||
OPTIMIZER: 'adam'
|
||||
LR: 0.001
|
||||
LR_FACTOR: 0.1
|
||||
LR_STEP:
|
||||
- 90
|
||||
- 120
|
||||
WD: 0.0001
|
||||
GAMMA1: 0.99
|
||||
GAMMA2: 0.0
|
||||
MOMENTUM: 0.9
|
||||
NESTEROV: false
|
||||
TEST:
|
||||
BATCH_SIZE: 32
|
||||
COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
|
||||
BBOX_THRE: 1.0
|
||||
FLIP_TEST: false
|
||||
IMAGE_THRE: 0.0
|
||||
IN_VIS_THRE: 0.2
|
||||
MODEL_FILE: ''
|
||||
NMS_THRE: 1.0
|
||||
OKS_THRE: 0.9
|
||||
USE_GT_BBOX: true
|
||||
DEBUG:
|
||||
DEBUG: true
|
||||
SAVE_BATCH_IMAGES_GT: true
|
||||
SAVE_BATCH_IMAGES_PRED: true
|
||||
SAVE_HEATMAPS_GT: true
|
||||
SAVE_HEATMAPS_PRED: true
|
|
@ -0,0 +1,76 @@
|
|||
GPUS: '0'
|
||||
DATA_DIR: ''
|
||||
OUTPUT_DIR: 'output'
|
||||
LOG_DIR: 'log'
|
||||
WORKERS: 4
|
||||
PRINT_FREQ: 100
|
||||
|
||||
DATASET:
|
||||
DATASET: 'coco'
|
||||
ROOT: 'data/coco/'
|
||||
TEST_SET: 'val2017'
|
||||
TRAIN_SET: 'train2017'
|
||||
FLIP: true
|
||||
ROT_FACTOR: 40
|
||||
SCALE_FACTOR: 0.3
|
||||
MODEL:
|
||||
NAME: 'pose_resnet'
|
||||
PRETRAINED: 'models/pytorch/imagenet/resnet152-b121ed2d.pth'
|
||||
IMAGE_SIZE:
|
||||
- 192
|
||||
- 256
|
||||
NUM_JOINTS: 17
|
||||
EXTRA:
|
||||
TARGET_TYPE: 'gaussian'
|
||||
HEATMAP_SIZE:
|
||||
- 48
|
||||
- 64
|
||||
SIGMA: 2
|
||||
FINAL_CONV_KERNEL: 1
|
||||
DECONV_WITH_BIAS: false
|
||||
NUM_DECONV_LAYERS: 3
|
||||
NUM_DECONV_FILTERS:
|
||||
- 256
|
||||
- 256
|
||||
- 256
|
||||
NUM_DECONV_KERNELS:
|
||||
- 4
|
||||
- 4
|
||||
- 4
|
||||
NUM_LAYERS: 152
|
||||
LOSS:
|
||||
USE_TARGET_WEIGHT: true
|
||||
TRAIN:
|
||||
BATCH_SIZE: 32
|
||||
SHUFFLE: true
|
||||
BEGIN_EPOCH: 0
|
||||
END_EPOCH: 140
|
||||
RESUME: false
|
||||
OPTIMIZER: 'adam'
|
||||
LR: 0.001
|
||||
LR_FACTOR: 0.1
|
||||
LR_STEP:
|
||||
- 90
|
||||
- 120
|
||||
WD: 0.0001
|
||||
GAMMA1: 0.99
|
||||
GAMMA2: 0.0
|
||||
MOMENTUM: 0.9
|
||||
NESTEROV: false
|
||||
TEST:
|
||||
BATCH_SIZE: 32
|
||||
COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
|
||||
BBOX_THRE: 1.0
|
||||
FLIP_TEST: false
|
||||
IMAGE_THRE: 0.0
|
||||
IN_VIS_THRE: 0.2
|
||||
MODEL_FILE: ''
|
||||
NMS_THRE: 1.0
|
||||
OKS_THRE: 0.9
|
||||
USE_GT_BBOX: true
|
||||
DEBUG:
|
||||
DEBUG: true
|
||||
SAVE_BATCH_IMAGES_GT: true
|
||||
SAVE_BATCH_IMAGES_PRED: true
|
||||
SAVE_HEATMAPS_GT: true
|
||||
SAVE_HEATMAPS_PRED: true
|
|
@ -0,0 +1,76 @@
|
|||
GPUS: '0'
|
||||
DATA_DIR: ''
|
||||
OUTPUT_DIR: 'output'
|
||||
LOG_DIR: 'log'
|
||||
WORKERS: 4
|
||||
PRINT_FREQ: 100
|
||||
|
||||
DATASET:
|
||||
DATASET: 'coco'
|
||||
ROOT: 'data/coco/'
|
||||
TEST_SET: 'val2017'
|
||||
TRAIN_SET: 'train2017'
|
||||
FLIP: true
|
||||
ROT_FACTOR: 40
|
||||
SCALE_FACTOR: 0.3
|
||||
MODEL:
|
||||
NAME: 'pose_resnet'
|
||||
PRETRAINED: 'models/pytorch/imagenet/resnet152-b121ed2d.pth'
|
||||
IMAGE_SIZE:
|
||||
- 288
|
||||
- 384
|
||||
NUM_JOINTS: 17
|
||||
EXTRA:
|
||||
TARGET_TYPE: 'gaussian'
|
||||
HEATMAP_SIZE:
|
||||
- 72
|
||||
- 96
|
||||
SIGMA: 3
|
||||
FINAL_CONV_KERNEL: 1
|
||||
DECONV_WITH_BIAS: false
|
||||
NUM_DECONV_LAYERS: 3
|
||||
NUM_DECONV_FILTERS:
|
||||
- 256
|
||||
- 256
|
||||
- 256
|
||||
NUM_DECONV_KERNELS:
|
||||
- 4
|
||||
- 4
|
||||
- 4
|
||||
NUM_LAYERS: 152
|
||||
LOSS:
|
||||
USE_TARGET_WEIGHT: true
|
||||
TRAIN:
|
||||
BATCH_SIZE: 32
|
||||
SHUFFLE: true
|
||||
BEGIN_EPOCH: 0
|
||||
END_EPOCH: 140
|
||||
RESUME: false
|
||||
OPTIMIZER: 'adam'
|
||||
LR: 0.001
|
||||
LR_FACTOR: 0.1
|
||||
LR_STEP:
|
||||
- 90
|
||||
- 120
|
||||
WD: 0.0001
|
||||
GAMMA1: 0.99
|
||||
GAMMA2: 0.0
|
||||
MOMENTUM: 0.9
|
||||
NESTEROV: false
|
||||
TEST:
|
||||
BATCH_SIZE: 32
|
||||
COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
|
||||
BBOX_THRE: 1.0
|
||||
FLIP_TEST: false
|
||||
IMAGE_THRE: 0.0
|
||||
IN_VIS_THRE: 0.2
|
||||
MODEL_FILE: ''
|
||||
NMS_THRE: 1.0
|
||||
OKS_THRE: 0.9
|
||||
USE_GT_BBOX: true
|
||||
DEBUG:
|
||||
DEBUG: true
|
||||
SAVE_BATCH_IMAGES_GT: true
|
||||
SAVE_BATCH_IMAGES_PRED: true
|
||||
SAVE_HEATMAPS_GT: true
|
||||
SAVE_HEATMAPS_PRED: true
|
|
@ -0,0 +1,76 @@
|
|||
GPUS: '0'
|
||||
DATA_DIR: ''
|
||||
OUTPUT_DIR: 'output'
|
||||
LOG_DIR: 'log'
|
||||
WORKERS: 4
|
||||
PRINT_FREQ: 100
|
||||
|
||||
DATASET:
|
||||
DATASET: 'coco'
|
||||
ROOT: 'data/coco/'
|
||||
TEST_SET: 'val2017'
|
||||
TRAIN_SET: 'train2017'
|
||||
FLIP: true
|
||||
ROT_FACTOR: 40
|
||||
SCALE_FACTOR: 0.3
|
||||
MODEL:
|
||||
NAME: 'pose_resnet'
|
||||
PRETRAINED: 'models/pytorch/imagenet/resnet50-19c8e357.pth'
|
||||
IMAGE_SIZE:
|
||||
- 192
|
||||
- 256
|
||||
NUM_JOINTS: 17
|
||||
EXTRA:
|
||||
TARGET_TYPE: 'gaussian'
|
||||
HEATMAP_SIZE:
|
||||
- 48
|
||||
- 64
|
||||
SIGMA: 2
|
||||
FINAL_CONV_KERNEL: 1
|
||||
DECONV_WITH_BIAS: false
|
||||
NUM_DECONV_LAYERS: 3
|
||||
NUM_DECONV_FILTERS:
|
||||
- 256
|
||||
- 256
|
||||
- 256
|
||||
NUM_DECONV_KERNELS:
|
||||
- 4
|
||||
- 4
|
||||
- 4
|
||||
NUM_LAYERS: 50
|
||||
LOSS:
|
||||
USE_TARGET_WEIGHT: true
|
||||
TRAIN:
|
||||
BATCH_SIZE: 32
|
||||
SHUFFLE: true
|
||||
BEGIN_EPOCH: 0
|
||||
END_EPOCH: 140
|
||||
RESUME: false
|
||||
OPTIMIZER: 'adam'
|
||||
LR: 0.001
|
||||
LR_FACTOR: 0.1
|
||||
LR_STEP:
|
||||
- 90
|
||||
- 120
|
||||
WD: 0.0001
|
||||
GAMMA1: 0.99
|
||||
GAMMA2: 0.0
|
||||
MOMENTUM: 0.9
|
||||
NESTEROV: false
|
||||
TEST:
|
||||
BATCH_SIZE: 32
|
||||
COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
|
||||
BBOX_THRE: 1.0
|
||||
FLIP_TEST: false
|
||||
IMAGE_THRE: 0.0
|
||||
IN_VIS_THRE: 0.2
|
||||
MODEL_FILE: ''
|
||||
NMS_THRE: 1.0
|
||||
OKS_THRE: 0.9
|
||||
USE_GT_BBOX: true
|
||||
DEBUG:
|
||||
DEBUG: true
|
||||
SAVE_BATCH_IMAGES_GT: true
|
||||
SAVE_BATCH_IMAGES_PRED: true
|
||||
SAVE_HEATMAPS_GT: true
|
||||
SAVE_HEATMAPS_PRED: true
|
|
@ -0,0 +1,76 @@
|
|||
GPUS: '0'
|
||||
DATA_DIR: ''
|
||||
OUTPUT_DIR: 'output'
|
||||
LOG_DIR: 'log'
|
||||
WORKERS: 4
|
||||
PRINT_FREQ: 100
|
||||
|
||||
DATASET:
|
||||
DATASET: 'coco'
|
||||
ROOT: 'data/coco/'
|
||||
TEST_SET: 'val2017'
|
||||
TRAIN_SET: 'train2017'
|
||||
FLIP: true
|
||||
ROT_FACTOR: 40
|
||||
SCALE_FACTOR: 0.3
|
||||
MODEL:
|
||||
NAME: 'pose_resnet'
|
||||
PRETRAINED: 'models/pytorch/imagenet/resnet50-19c8e357.pth'
|
||||
IMAGE_SIZE:
|
||||
- 288
|
||||
- 384
|
||||
NUM_JOINTS: 17
|
||||
EXTRA:
|
||||
TARGET_TYPE: 'gaussian'
|
||||
HEATMAP_SIZE:
|
||||
- 72
|
||||
- 96
|
||||
SIGMA: 3
|
||||
FINAL_CONV_KERNEL: 1
|
||||
DECONV_WITH_BIAS: false
|
||||
NUM_DECONV_LAYERS: 3
|
||||
NUM_DECONV_FILTERS:
|
||||
- 256
|
||||
- 256
|
||||
- 256
|
||||
NUM_DECONV_KERNELS:
|
||||
- 4
|
||||
- 4
|
||||
- 4
|
||||
NUM_LAYERS: 50
|
||||
LOSS:
|
||||
USE_TARGET_WEIGHT: true
|
||||
TRAIN:
|
||||
BATCH_SIZE: 32
|
||||
SHUFFLE: true
|
||||
BEGIN_EPOCH: 0
|
||||
END_EPOCH: 140
|
||||
RESUME: false
|
||||
OPTIMIZER: 'adam'
|
||||
LR: 0.001
|
||||
LR_FACTOR: 0.1
|
||||
LR_STEP:
|
||||
- 90
|
||||
- 120
|
||||
WD: 0.0001
|
||||
GAMMA1: 0.99
|
||||
GAMMA2: 0.0
|
||||
MOMENTUM: 0.9
|
||||
NESTEROV: false
|
||||
TEST:
|
||||
BATCH_SIZE: 32
|
||||
COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
|
||||
BBOX_THRE: 1.0
|
||||
FLIP_TEST: false
|
||||
IMAGE_THRE: 0.0
|
||||
IN_VIS_THRE: 0.2
|
||||
MODEL_FILE: ''
|
||||
NMS_THRE: 1.0
|
||||
OKS_THRE: 0.9
|
||||
USE_GT_BBOX: true
|
||||
DEBUG:
|
||||
DEBUG: true
|
||||
SAVE_BATCH_IMAGES_GT: true
|
||||
SAVE_BATCH_IMAGES_PRED: true
|
||||
SAVE_HEATMAPS_GT: true
|
||||
SAVE_HEATMAPS_PRED: true
|
|
@ -0,0 +1,72 @@
|
|||
GPUS: '0'
|
||||
DATA_DIR: ''
|
||||
OUTPUT_DIR: 'output'
|
||||
LOG_DIR: 'log'
|
||||
WORKERS: 4
|
||||
PRINT_FREQ: 100
|
||||
CUDNN:
|
||||
BENCHMARK: True
|
||||
DETERMINISTIC: False
|
||||
ENABLED: True
|
||||
DATASET:
|
||||
DATASET: mpii
|
||||
ROOT: 'data/mpii/'
|
||||
TEST_SET: valid
|
||||
TRAIN_SET: train
|
||||
FLIP: true
|
||||
ROT_FACTOR: 30
|
||||
SCALE_FACTOR: 0.25
|
||||
MODEL:
|
||||
NAME: pose_resnet
|
||||
PRETRAINED: 'models/pytorch/imagenet/resnet101-5d3b4d8f.pth'
|
||||
IMAGE_SIZE:
|
||||
- 256
|
||||
- 256
|
||||
NUM_JOINTS: 16
|
||||
EXTRA:
|
||||
TARGET_TYPE: gaussian
|
||||
SIGMA: 2
|
||||
HEATMAP_SIZE:
|
||||
- 64
|
||||
- 64
|
||||
FINAL_CONV_KERNEL: 1
|
||||
DECONV_WITH_BIAS: false
|
||||
NUM_DECONV_LAYERS: 3
|
||||
NUM_DECONV_FILTERS:
|
||||
- 256
|
||||
- 256
|
||||
- 256
|
||||
NUM_DECONV_KERNELS:
|
||||
- 4
|
||||
- 4
|
||||
- 4
|
||||
NUM_LAYERS: 101
|
||||
LOSS:
|
||||
USE_TARGET_WEIGHT: true
|
||||
TRAIN:
|
||||
BATCH_SIZE: 32
|
||||
SHUFFLE: true
|
||||
BEGIN_EPOCH: 0
|
||||
END_EPOCH: 140
|
||||
RESUME: false
|
||||
OPTIMIZER: adam
|
||||
LR: 0.001
|
||||
LR_FACTOR: 0.1
|
||||
LR_STEP:
|
||||
- 90
|
||||
- 120
|
||||
WD: 0.0001
|
||||
GAMMA1: 0.99
|
||||
GAMMA2: 0.0
|
||||
MOMENTUM: 0.9
|
||||
NESTEROV: false
|
||||
TEST:
|
||||
BATCH_SIZE: 32
|
||||
FLIP_TEST: false
|
||||
MODEL_FILE: ''
|
||||
DEBUG:
|
||||
DEBUG: false
|
||||
SAVE_BATCH_IMAGES_GT: true
|
||||
SAVE_BATCH_IMAGES_PRED: true
|
||||
SAVE_HEATMAPS_GT: true
|
||||
SAVE_HEATMAPS_PRED: true
|
|
@ -0,0 +1,69 @@
|
|||
GPUS: '0'
|
||||
DATA_DIR: ''
|
||||
OUTPUT_DIR: 'output'
|
||||
LOG_DIR: 'log'
|
||||
WORKERS: 4
|
||||
PRINT_FREQ: 100
|
||||
|
||||
DATASET:
|
||||
DATASET: mpii
|
||||
ROOT: 'data/mpii/'
|
||||
TEST_SET: valid
|
||||
TRAIN_SET: train
|
||||
FLIP: true
|
||||
ROT_FACTOR: 30
|
||||
SCALE_FACTOR: 0.25
|
||||
MODEL:
|
||||
NAME: pose_resnet
|
||||
PRETRAINED: 'models/pytorch/imagenet/resnet101-5d3b4d8f.pth'
|
||||
IMAGE_SIZE:
|
||||
- 384
|
||||
- 384
|
||||
NUM_JOINTS: 16
|
||||
EXTRA:
|
||||
TARGET_TYPE: gaussian
|
||||
HEATMAP_SIZE:
|
||||
- 96
|
||||
- 96
|
||||
SIGMA: 3
|
||||
FINAL_CONV_KERNEL: 1
|
||||
DECONV_WITH_BIAS: false
|
||||
NUM_DECONV_LAYERS: 3
|
||||
NUM_DECONV_FILTERS:
|
||||
- 256
|
||||
- 256
|
||||
- 256
|
||||
NUM_DECONV_KERNELS:
|
||||
- 4
|
||||
- 4
|
||||
- 4
|
||||
NUM_LAYERS: 101
|
||||
LOSS:
|
||||
USE_TARGET_WEIGHT: true
|
||||
TRAIN:
|
||||
BATCH_SIZE: 32
|
||||
SHUFFLE: true
|
||||
BEGIN_EPOCH: 0
|
||||
END_EPOCH: 140
|
||||
RESUME: false
|
||||
OPTIMIZER: adam
|
||||
LR: 0.001
|
||||
LR_FACTOR: 0.1
|
||||
LR_STEP:
|
||||
- 90
|
||||
- 120
|
||||
WD: 0.0001
|
||||
GAMMA1: 0.99
|
||||
GAMMA2: 0.0
|
||||
MOMENTUM: 0.9
|
||||
NESTEROV: false
|
||||
TEST:
|
||||
BATCH_SIZE: 32
|
||||
FLIP_TEST: false
|
||||
MODEL_FILE: ''
|
||||
DEBUG:
|
||||
DEBUG: false
|
||||
SAVE_BATCH_IMAGES_GT: true
|
||||
SAVE_BATCH_IMAGES_PRED: true
|
||||
SAVE_HEATMAPS_GT: true
|
||||
SAVE_HEATMAPS_PRED: true
|
|
@ -0,0 +1,72 @@
|
|||
GPUS: '0'
|
||||
DATA_DIR: ''
|
||||
OUTPUT_DIR: 'output'
|
||||
LOG_DIR: 'log'
|
||||
WORKERS: 4
|
||||
PRINT_FREQ: 100
|
||||
CUDNN:
|
||||
BENCHMARK: True
|
||||
DETERMINISTIC: False
|
||||
ENABLED: True
|
||||
DATASET:
|
||||
DATASET: mpii
|
||||
ROOT: 'data/mpii/'
|
||||
TEST_SET: valid
|
||||
TRAIN_SET: train
|
||||
FLIP: true
|
||||
ROT_FACTOR: 30
|
||||
SCALE_FACTOR: 0.25
|
||||
MODEL:
|
||||
NAME: pose_resnet
|
||||
PRETRAINED: 'models/pytorch/imagenet/resnet152-b121ed2d.pth'
|
||||
IMAGE_SIZE:
|
||||
- 256
|
||||
- 256
|
||||
NUM_JOINTS: 16
|
||||
EXTRA:
|
||||
TARGET_TYPE: gaussian
|
||||
SIGMA: 2
|
||||
HEATMAP_SIZE:
|
||||
- 64
|
||||
- 64
|
||||
FINAL_CONV_KERNEL: 1
|
||||
DECONV_WITH_BIAS: false
|
||||
NUM_DECONV_LAYERS: 3
|
||||
NUM_DECONV_FILTERS:
|
||||
- 256
|
||||
- 256
|
||||
- 256
|
||||
NUM_DECONV_KERNELS:
|
||||
- 4
|
||||
- 4
|
||||
- 4
|
||||
NUM_LAYERS: 152
|
||||
LOSS:
|
||||
USE_TARGET_WEIGHT: true
|
||||
TRAIN:
|
||||
BATCH_SIZE: 32
|
||||
SHUFFLE: true
|
||||
BEGIN_EPOCH: 0
|
||||
END_EPOCH: 140
|
||||
RESUME: false
|
||||
OPTIMIZER: adam
|
||||
LR: 0.001
|
||||
LR_FACTOR: 0.1
|
||||
LR_STEP:
|
||||
- 90
|
||||
- 120
|
||||
WD: 0.0001
|
||||
GAMMA1: 0.99
|
||||
GAMMA2: 0.0
|
||||
MOMENTUM: 0.9
|
||||
NESTEROV: false
|
||||
TEST:
|
||||
BATCH_SIZE: 32
|
||||
FLIP_TEST: false
|
||||
MODEL_FILE: ''
|
||||
DEBUG:
|
||||
DEBUG: false
|
||||
SAVE_BATCH_IMAGES_GT: true
|
||||
SAVE_BATCH_IMAGES_PRED: true
|
||||
SAVE_HEATMAPS_GT: true
|
||||
SAVE_HEATMAPS_PRED: true
|
|
@ -0,0 +1,72 @@
|
|||
GPUS: '0'
|
||||
DATA_DIR: ''
|
||||
OUTPUT_DIR: 'output'
|
||||
LOG_DIR: 'log'
|
||||
WORKERS: 4
|
||||
PRINT_FREQ: 100
|
||||
CUDNN:
|
||||
BENCHMARK: True
|
||||
DETERMINISTIC: False
|
||||
ENABLED: True
|
||||
DATASET:
|
||||
DATASET: mpii
|
||||
ROOT: 'data/mpii/'
|
||||
TEST_SET: valid
|
||||
TRAIN_SET: train
|
||||
FLIP: true
|
||||
ROT_FACTOR: 30
|
||||
SCALE_FACTOR: 0.25
|
||||
MODEL:
|
||||
NAME: pose_resnet
|
||||
PRETRAINED: 'models/pytorch/imagenet/resnet152-b121ed2d.pth'
|
||||
IMAGE_SIZE:
|
||||
- 256
|
||||
- 256
|
||||
NUM_JOINTS: 16
|
||||
EXTRA:
|
||||
TARGET_TYPE: gaussian
|
||||
SIGMA: 2
|
||||
HEATMAP_SIZE:
|
||||
- 64
|
||||
- 64
|
||||
FINAL_CONV_KERNEL: 1
|
||||
DECONV_WITH_BIAS: false
|
||||
NUM_DECONV_LAYERS: 3
|
||||
NUM_DECONV_FILTERS:
|
||||
- 256
|
||||
- 256
|
||||
- 256
|
||||
NUM_DECONV_KERNELS:
|
||||
- 4
|
||||
- 4
|
||||
- 4
|
||||
NUM_LAYERS: 152
|
||||
LOSS:
|
||||
USE_TARGET_WEIGHT: true
|
||||
TRAIN:
|
||||
BATCH_SIZE: 24
|
||||
SHUFFLE: true
|
||||
BEGIN_EPOCH: 0
|
||||
END_EPOCH: 140
|
||||
RESUME: false
|
||||
OPTIMIZER: adam
|
||||
LR: 0.001
|
||||
LR_FACTOR: 0.1
|
||||
LR_STEP:
|
||||
- 90
|
||||
- 120
|
||||
WD: 0.0001
|
||||
GAMMA1: 0.99
|
||||
GAMMA2: 0.0
|
||||
MOMENTUM: 0.9
|
||||
NESTEROV: false
|
||||
TEST:
|
||||
BATCH_SIZE: 32
|
||||
FLIP_TEST: false
|
||||
MODEL_FILE: ''
|
||||
DEBUG:
|
||||
DEBUG: false
|
||||
SAVE_BATCH_IMAGES_GT: true
|
||||
SAVE_BATCH_IMAGES_PRED: true
|
||||
SAVE_HEATMAPS_GT: true
|
||||
SAVE_HEATMAPS_PRED: true
|
|
@ -0,0 +1,72 @@
|
|||
GPUS: '0'
|
||||
DATA_DIR: ''
|
||||
OUTPUT_DIR: 'output'
|
||||
LOG_DIR: 'log'
|
||||
WORKERS: 4
|
||||
PRINT_FREQ: 100
|
||||
CUDNN:
|
||||
BENCHMARK: True
|
||||
DETERMINISTIC: False
|
||||
ENABLED: True
|
||||
DATASET:
|
||||
DATASET: mpii
|
||||
ROOT: 'data/mpii/'
|
||||
TEST_SET: valid
|
||||
TRAIN_SET: train
|
||||
FLIP: true
|
||||
ROT_FACTOR: 30
|
||||
SCALE_FACTOR: 0.25
|
||||
MODEL:
|
||||
NAME: pose_resnet
|
||||
PRETRAINED: 'models/pytorch/imagenet/resnet50-19c8e357.pth'
|
||||
IMAGE_SIZE:
|
||||
- 256
|
||||
- 256
|
||||
NUM_JOINTS: 16
|
||||
EXTRA:
|
||||
TARGET_TYPE: gaussian
|
||||
SIGMA: 2
|
||||
HEATMAP_SIZE:
|
||||
- 64
|
||||
- 64
|
||||
FINAL_CONV_KERNEL: 1
|
||||
DECONV_WITH_BIAS: false
|
||||
NUM_DECONV_LAYERS: 3
|
||||
NUM_DECONV_FILTERS:
|
||||
- 256
|
||||
- 256
|
||||
- 256
|
||||
NUM_DECONV_KERNELS:
|
||||
- 4
|
||||
- 4
|
||||
- 4
|
||||
NUM_LAYERS: 50
|
||||
LOSS:
|
||||
USE_TARGET_WEIGHT: true
|
||||
TRAIN:
|
||||
BATCH_SIZE: 32
|
||||
SHUFFLE: true
|
||||
BEGIN_EPOCH: 0
|
||||
END_EPOCH: 140
|
||||
RESUME: false
|
||||
OPTIMIZER: adam
|
||||
LR: 0.001
|
||||
LR_FACTOR: 0.1
|
||||
LR_STEP:
|
||||
- 90
|
||||
- 120
|
||||
WD: 0.0001
|
||||
GAMMA1: 0.99
|
||||
GAMMA2: 0.0
|
||||
MOMENTUM: 0.9
|
||||
NESTEROV: false
|
||||
TEST:
|
||||
BATCH_SIZE: 32
|
||||
FLIP_TEST: false
|
||||
MODEL_FILE: ''
|
||||
DEBUG:
|
||||
DEBUG: false
|
||||
SAVE_BATCH_IMAGES_GT: true
|
||||
SAVE_BATCH_IMAGES_PRED: true
|
||||
SAVE_HEATMAPS_GT: true
|
||||
SAVE_HEATMAPS_PRED: true
|
|
@ -0,0 +1,72 @@
|
|||
GPUS: '0'
|
||||
DATA_DIR: ''
|
||||
OUTPUT_DIR: 'output'
|
||||
LOG_DIR: 'log'
|
||||
WORKERS: 4
|
||||
PRINT_FREQ: 100
|
||||
CUDNN:
|
||||
BENCHMARK: True
|
||||
DETERMINISTIC: False
|
||||
ENABLED: True
|
||||
DATASET:
|
||||
DATASET: mpii
|
||||
ROOT: 'data/mpii/'
|
||||
TEST_SET: valid
|
||||
TRAIN_SET: train
|
||||
FLIP: true
|
||||
ROT_FACTOR: 30
|
||||
SCALE_FACTOR: 0.25
|
||||
MODEL:
|
||||
NAME: pose_resnet
|
||||
PRETRAINED: 'models/pytorch/imagenet/resnet50-19c8e357.pth'
|
||||
IMAGE_SIZE:
|
||||
- 256
|
||||
- 256
|
||||
NUM_JOINTS: 16
|
||||
EXTRA:
|
||||
TARGET_TYPE: gaussian
|
||||
SIGMA: 2
|
||||
HEATMAP_SIZE:
|
||||
- 64
|
||||
- 64
|
||||
FINAL_CONV_KERNEL: 1
|
||||
DECONV_WITH_BIAS: false
|
||||
NUM_DECONV_LAYERS: 3
|
||||
NUM_DECONV_FILTERS:
|
||||
- 256
|
||||
- 256
|
||||
- 256
|
||||
NUM_DECONV_KERNELS:
|
||||
- 4
|
||||
- 4
|
||||
- 4
|
||||
NUM_LAYERS: 50
|
||||
LOSS:
|
||||
USE_TARGET_WEIGHT: true
|
||||
TRAIN:
|
||||
BATCH_SIZE: 32
|
||||
SHUFFLE: true
|
||||
BEGIN_EPOCH: 0
|
||||
END_EPOCH: 140
|
||||
RESUME: false
|
||||
OPTIMIZER: adam
|
||||
LR: 0.001
|
||||
LR_FACTOR: 0.1
|
||||
LR_STEP:
|
||||
- 90
|
||||
- 120
|
||||
WD: 0.0001
|
||||
GAMMA1: 0.99
|
||||
GAMMA2: 0.0
|
||||
MOMENTUM: 0.9
|
||||
NESTEROV: false
|
||||
TEST:
|
||||
BATCH_SIZE: 32
|
||||
FLIP_TEST: false
|
||||
MODEL_FILE: ''
|
||||
DEBUG:
|
||||
DEBUG: false
|
||||
SAVE_BATCH_IMAGES_GT: true
|
||||
SAVE_BATCH_IMAGES_PRED: true
|
||||
SAVE_HEATMAPS_GT: true
|
||||
SAVE_HEATMAPS_PRED: true
|
|
@ -0,0 +1,4 @@
|
|||
all:
|
||||
cd nms; python setup.py build_ext --inplace; rm -rf build; cd ../../
|
||||
clean:
|
||||
cd nms; rm *.so; cd ../../
|
|
@ -0,0 +1,225 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import yaml
|
||||
|
||||
import numpy as np
|
||||
from easydict import EasyDict as edict
|
||||
|
||||
|
||||
config = edict()
|
||||
|
||||
config.OUTPUT_DIR = ''
|
||||
config.LOG_DIR = ''
|
||||
config.DATA_DIR = ''
|
||||
config.GPUS = '0'
|
||||
config.WORKERS = 4
|
||||
config.PRINT_FREQ = 20
|
||||
|
||||
# Cudnn related params
|
||||
config.CUDNN = edict()
|
||||
config.CUDNN.BENCHMARK = True
|
||||
config.CUDNN.DETERMINISTIC = False
|
||||
config.CUDNN.ENABLED = True
|
||||
|
||||
# pose_resnet related params
|
||||
POSE_RESNET = edict()
|
||||
POSE_RESNET.NUM_LAYERS = 50
|
||||
POSE_RESNET.DECONV_WITH_BIAS = False
|
||||
POSE_RESNET.NUM_DECONV_LAYERS = 3
|
||||
POSE_RESNET.NUM_DECONV_FILTERS = [256, 256, 256]
|
||||
POSE_RESNET.NUM_DECONV_KERNELS = [4, 4, 4]
|
||||
POSE_RESNET.FINAL_CONV_KERNEL = 1
|
||||
POSE_RESNET.TARGET_TYPE = 'gaussian'
|
||||
POSE_RESNET.HEATMAP_SIZE = [64, 64] # width * height, ex: 24 * 32
|
||||
POSE_RESNET.SIGMA = 2
|
||||
|
||||
MODEL_EXTRAS = {
|
||||
'pose_resnet': POSE_RESNET,
|
||||
}
|
||||
|
||||
# common params for NETWORK
|
||||
config.MODEL = edict()
|
||||
config.MODEL.NAME = 'pose_resnet'
|
||||
config.MODEL.INIT_WEIGHTS = True
|
||||
config.MODEL.PRETRAINED = ''
|
||||
config.MODEL.NUM_JOINTS = 16
|
||||
config.MODEL.IMAGE_SIZE = [256, 256] # width * height, ex: 192 * 256
|
||||
config.MODEL.EXTRA = MODEL_EXTRAS[config.MODEL.NAME]
|
||||
|
||||
config.LOSS = edict()
|
||||
config.LOSS.USE_TARGET_WEIGHT = True
|
||||
|
||||
# DATASET related params
|
||||
config.DATASET = edict()
|
||||
config.DATASET.ROOT = ''
|
||||
config.DATASET.DATASET = 'mpii'
|
||||
config.DATASET.TRAIN_SET = 'train'
|
||||
config.DATASET.TEST_SET = 'valid'
|
||||
config.DATASET.DATA_FORMAT = 'jpg'
|
||||
config.DATASET.HYBRID_JOINTS_TYPE = ''
|
||||
config.DATASET.SELECT_DATA = False
|
||||
|
||||
# training data augmentation
|
||||
config.DATASET.FLIP = True
|
||||
config.DATASET.SCALE_FACTOR = 0.25
|
||||
config.DATASET.ROT_FACTOR = 30
|
||||
|
||||
# train
|
||||
config.TRAIN = edict()
|
||||
|
||||
config.TRAIN.LR_FACTOR = 0.1
|
||||
config.TRAIN.LR_STEP = [90, 110]
|
||||
config.TRAIN.LR = 0.001
|
||||
|
||||
config.TRAIN.OPTIMIZER = 'adam'
|
||||
config.TRAIN.MOMENTUM = 0.9
|
||||
config.TRAIN.WD = 0.0001
|
||||
config.TRAIN.NESTEROV = False
|
||||
config.TRAIN.GAMMA1 = 0.99
|
||||
config.TRAIN.GAMMA2 = 0.0
|
||||
|
||||
config.TRAIN.BEGIN_EPOCH = 0
|
||||
config.TRAIN.END_EPOCH = 140
|
||||
|
||||
config.TRAIN.RESUME = False
|
||||
config.TRAIN.CHECKPOINT = ''
|
||||
|
||||
config.TRAIN.BATCH_SIZE = 32
|
||||
config.TRAIN.SHUFFLE = True
|
||||
|
||||
# testing
|
||||
config.TEST = edict()
|
||||
|
||||
# size of images for each device
|
||||
config.TEST.BATCH_SIZE = 32
|
||||
# Test Model Epoch
|
||||
config.TEST.FLIP_TEST = False
|
||||
config.TEST.POST_PROCESS = True
|
||||
config.TEST.SHIFT_HEATMAP = True
|
||||
|
||||
config.TEST.USE_GT_BBOX = False
|
||||
# nms
|
||||
config.TEST.OKS_THRE = 0.5
|
||||
config.TEST.IN_VIS_THRE = 0.0
|
||||
config.TEST.COCO_BBOX_FILE = ''
|
||||
config.TEST.BBOX_THRE = 1.0
|
||||
config.TEST.MODEL_FILE = ''
|
||||
|
||||
# debug
|
||||
config.DEBUG = edict()
|
||||
config.DEBUG.DEBUG = False
|
||||
config.DEBUG.SAVE_BATCH_IMAGES_GT = False
|
||||
config.DEBUG.SAVE_BATCH_IMAGES_PRED = False
|
||||
config.DEBUG.SAVE_HEATMAPS_GT = False
|
||||
config.DEBUG.SAVE_HEATMAPS_PRED = False
|
||||
|
||||
|
||||
def _update_dict(k, v):
|
||||
if k == 'DATASET':
|
||||
if 'MEAN' in v and v['MEAN']:
|
||||
v['MEAN'] = np.array([eval(x) if isinstance(x, str) else x
|
||||
for x in v['MEAN']])
|
||||
if 'STD' in v and v['STD']:
|
||||
v['STD'] = np.array([eval(x) if isinstance(x, str) else x
|
||||
for x in v['STD']])
|
||||
if k == 'MODEL':
|
||||
if 'EXTRA' in v and 'HEATMAP_SIZE' in v['EXTRA']:
|
||||
if isinstance(v['EXTRA']['HEATMAP_SIZE'], int):
|
||||
v['EXTRA']['HEATMAP_SIZE'] = np.array(
|
||||
[v['EXTRA']['HEATMAP_SIZE'], v['EXTRA']['HEATMAP_SIZE']])
|
||||
else:
|
||||
v['EXTRA']['HEATMAP_SIZE'] = np.array(
|
||||
v['EXTRA']['HEATMAP_SIZE'])
|
||||
if 'IMAGE_SIZE' in v:
|
||||
if isinstance(v['IMAGE_SIZE'], int):
|
||||
v['IMAGE_SIZE'] = np.array([v['IMAGE_SIZE'], v['IMAGE_SIZE']])
|
||||
else:
|
||||
v['IMAGE_SIZE'] = np.array(v['IMAGE_SIZE'])
|
||||
for vk, vv in v.items():
|
||||
if vk in config[k]:
|
||||
config[k][vk] = vv
|
||||
else:
|
||||
raise ValueError("{}.{} not exist in config.py".format(k, vk))
|
||||
|
||||
|
||||
def update_config(config_file):
|
||||
exp_config = None
|
||||
with open(config_file) as f:
|
||||
exp_config = edict(yaml.load(f))
|
||||
for k, v in exp_config.items():
|
||||
if k in config:
|
||||
if isinstance(v, dict):
|
||||
_update_dict(k, v)
|
||||
else:
|
||||
if k == 'SCALES':
|
||||
config[k][0] = (tuple(v))
|
||||
else:
|
||||
config[k] = v
|
||||
else:
|
||||
raise ValueError("{} not exist in config.py".format(k))
|
||||
|
||||
|
||||
def gen_config(config_file):
|
||||
cfg = dict(config)
|
||||
for k, v in cfg.items():
|
||||
if isinstance(v, edict):
|
||||
cfg[k] = dict(v)
|
||||
|
||||
with open(config_file, 'w') as f:
|
||||
yaml.dump(dict(cfg), f, default_flow_style=False)
|
||||
|
||||
|
||||
def update_dir(model_dir, log_dir, data_dir):
|
||||
if model_dir:
|
||||
config.OUTPUT_DIR = model_dir
|
||||
|
||||
if log_dir:
|
||||
config.LOG_DIR = log_dir
|
||||
|
||||
if data_dir:
|
||||
config.DATA_DIR = data_dir
|
||||
|
||||
config.DATASET.ROOT = os.path.join(
|
||||
config.DATA_DIR, config.DATASET.ROOT)
|
||||
|
||||
config.TEST.COCO_BBOX_FILE = os.path.join(
|
||||
config.DATA_DIR, config.TEST.COCO_BBOX_FILE)
|
||||
|
||||
config.MODEL.PRETRAINED = os.path.join(
|
||||
config.DATA_DIR, config.MODEL.PRETRAINED)
|
||||
|
||||
|
||||
def get_model_name(cfg):
|
||||
name = cfg.MODEL.NAME
|
||||
full_name = cfg.MODEL.NAME
|
||||
extra = cfg.MODEL.EXTRA
|
||||
if name in ['pose_resnet']:
|
||||
name = '{model}_{num_layers}'.format(
|
||||
model=name,
|
||||
num_layers=extra.NUM_LAYERS)
|
||||
deconv_suffix = ''.join(
|
||||
'd{}'.format(num_filters)
|
||||
for num_filters in extra.NUM_DECONV_FILTERS)
|
||||
full_name = '{height}x{width}_{name}_{deconv_suffix}'.format(
|
||||
height=cfg.MODEL.IMAGE_SIZE[1],
|
||||
width=cfg.MODEL.IMAGE_SIZE[0],
|
||||
name=name,
|
||||
deconv_suffix=deconv_suffix)
|
||||
else:
|
||||
raise ValueError('Unkown model: {}'.format(cfg.MODEL))
|
||||
|
||||
return name, full_name
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
import sys
|
||||
gen_config(sys.argv[1])
|
|
@ -0,0 +1,71 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import numpy as np
|
||||
|
||||
from core.inference import get_max_preds
|
||||
|
||||
|
||||
def calc_dists(preds, target, normalize):
|
||||
preds = preds.astype(np.float32)
|
||||
target = target.astype(np.float32)
|
||||
dists = np.zeros((preds.shape[1], preds.shape[0]))
|
||||
for n in range(preds.shape[0]):
|
||||
for c in range(preds.shape[1]):
|
||||
if target[n, c, 0] > 1 and target[n, c, 1] > 1:
|
||||
normed_preds = preds[n, c, :] / normalize[n]
|
||||
normed_targets = target[n, c, :] / normalize[n]
|
||||
dists[c, n] = np.linalg.norm(normed_preds - normed_targets)
|
||||
else:
|
||||
dists[c, n] = -1
|
||||
return dists
|
||||
|
||||
|
||||
def dist_acc(dists, thr=0.5):
|
||||
''' Return percentage below threshold while ignoring values with a -1 '''
|
||||
dist_cal = np.not_equal(dists, -1)
|
||||
num_dist_cal = dist_cal.sum()
|
||||
if num_dist_cal > 0:
|
||||
return np.less(dists[dist_cal], thr).sum() * 1.0 / num_dist_cal
|
||||
else:
|
||||
return -1
|
||||
|
||||
|
||||
def accuracy(output, target, hm_type='gaussian', thr=0.5):
|
||||
'''
|
||||
Calculate accuracy according to PCK,
|
||||
but uses ground truth heatmap rather than x,y locations
|
||||
First value to be returned is average accuracy across 'idxs',
|
||||
followed by individual accuracies
|
||||
'''
|
||||
idx = list(range(output.shape[1]))
|
||||
norm = 1.0
|
||||
if hm_type == 'gaussian':
|
||||
pred, _ = get_max_preds(output)
|
||||
target, _ = get_max_preds(target)
|
||||
h = output.shape[2]
|
||||
w = output.shape[3]
|
||||
norm = np.ones((pred.shape[0], 2)) * np.array([h, w]) / 10
|
||||
dists = calc_dists(pred, target, norm)
|
||||
|
||||
acc = np.zeros((len(idx) + 1))
|
||||
avg_acc = 0
|
||||
cnt = 0
|
||||
|
||||
for i in range(len(idx)):
|
||||
acc[i + 1] = dist_acc(dists[idx[i]])
|
||||
if acc[i + 1] >= 0:
|
||||
avg_acc = avg_acc + acc[i + 1]
|
||||
cnt += 1
|
||||
|
||||
avg_acc = avg_acc / cnt if cnt != 0 else 0
|
||||
if cnt != 0:
|
||||
acc[0] = avg_acc
|
||||
return acc, avg_acc, cnt, pred
|
|
@ -0,0 +1,239 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import logging
|
||||
import time
|
||||
import os
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
from core.config import get_model_name
|
||||
from core.evaluate import accuracy
|
||||
from core.inference import get_final_preds
|
||||
from utils.transforms import flip_back
|
||||
from utils.vis import save_debug_images
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def train(config, train_loader, model, criterion, optimizer, epoch,
|
||||
output_dir, tb_log_dir, writer_dict):
|
||||
batch_time = AverageMeter()
|
||||
data_time = AverageMeter()
|
||||
losses = AverageMeter()
|
||||
acc = AverageMeter()
|
||||
|
||||
# switch to train mode
|
||||
model.train()
|
||||
|
||||
end = time.time()
|
||||
for i, (input, target, target_weight, meta) in enumerate(train_loader):
|
||||
# measure data loading time
|
||||
data_time.update(time.time() - end)
|
||||
|
||||
# compute output
|
||||
output = model(input)
|
||||
target = target.cuda(non_blocking=True)
|
||||
target_weight = target_weight.cuda(non_blocking=True)
|
||||
|
||||
loss = criterion(output, target, target_weight)
|
||||
|
||||
# compute gradient and do update step
|
||||
optimizer.zero_grad()
|
||||
loss.backward()
|
||||
optimizer.step()
|
||||
|
||||
# measure accuracy and record loss
|
||||
losses.update(loss.item(), input.size(0))
|
||||
|
||||
_, avg_acc, cnt, pred = accuracy(output.detach().cpu().numpy(),
|
||||
target.detach().cpu().numpy())
|
||||
acc.update(avg_acc, cnt)
|
||||
|
||||
# measure elapsed time
|
||||
batch_time.update(time.time() - end)
|
||||
end = time.time()
|
||||
|
||||
if i % config.PRINT_FREQ == 0:
|
||||
msg = 'Epoch: [{0}][{1}/{2}]\t' \
|
||||
'Time {batch_time.val:.3f}s ({batch_time.avg:.3f}s)\t' \
|
||||
'Speed {speed:.1f} samples/s\t' \
|
||||
'Data {data_time.val:.3f}s ({data_time.avg:.3f}s)\t' \
|
||||
'Loss {loss.val:.5f} ({loss.avg:.5f})\t' \
|
||||
'Accuracy {acc.val:.3f} ({acc.avg:.3f})'.format(
|
||||
epoch, i, len(train_loader), batch_time=batch_time,
|
||||
speed=input.size(0)/batch_time.val,
|
||||
data_time=data_time, loss=losses, acc=acc)
|
||||
logger.info(msg)
|
||||
|
||||
writer = writer_dict['writer']
|
||||
global_steps = writer_dict['train_global_steps']
|
||||
writer.add_scalar('train_loss', losses.val, global_steps)
|
||||
writer.add_scalar('train_acc', acc.val, global_steps)
|
||||
writer_dict['train_global_steps'] = global_steps + 1
|
||||
|
||||
prefix = '{}_{}'.format(os.path.join(output_dir, 'train'), i)
|
||||
save_debug_images(config, input, meta, target, pred*4, output,
|
||||
prefix)
|
||||
|
||||
|
||||
def validate(config, val_loader, val_dataset, model, criterion, output_dir,
|
||||
tb_log_dir, writer_dict=None):
|
||||
batch_time = AverageMeter()
|
||||
losses = AverageMeter()
|
||||
acc = AverageMeter()
|
||||
|
||||
# switch to evaluate mode
|
||||
model.eval()
|
||||
|
||||
num_samples = len(val_dataset)
|
||||
all_preds = np.zeros((num_samples, config.MODEL.NUM_JOINTS, 3),
|
||||
dtype=np.float32)
|
||||
all_boxes = np.zeros((num_samples, 6))
|
||||
image_path = []
|
||||
filenames = []
|
||||
imgnums = []
|
||||
idx = 0
|
||||
with torch.no_grad():
|
||||
end = time.time()
|
||||
for i, (input, target, target_weight, meta) in enumerate(val_loader):
|
||||
# compute output
|
||||
output = model(input)
|
||||
if config.TEST.FLIP_TEST:
|
||||
# this part is ugly, because pytorch has not supported negative index
|
||||
# input_flipped = model(input[:, :, :, ::-1])
|
||||
input_flipped = np.flip(input.cpu().numpy(), 3).copy()
|
||||
input_flipped = torch.from_numpy(input_flipped).cuda()
|
||||
output_flipped = model(input_flipped)
|
||||
output_flipped = flip_back(output_flipped.cpu().numpy(),
|
||||
val_dataset.flip_pairs)
|
||||
output_flipped = torch.from_numpy(output_flipped.copy()).cuda()
|
||||
|
||||
# feature is not aligned, shift flipped heatmap for higher accuracy
|
||||
if config.TEST.SHIFT_HEATMAP:
|
||||
output_flipped[:, :, :, 1:] = \
|
||||
output_flipped.clone()[:, :, :, 0:-1]
|
||||
# output_flipped[:, :, :, 0] = 0
|
||||
|
||||
output = (output + output_flipped) * 0.5
|
||||
|
||||
target = target.cuda(non_blocking=True)
|
||||
target_weight = target_weight.cuda(non_blocking=True)
|
||||
|
||||
loss = criterion(output, target, target_weight)
|
||||
|
||||
num_images = input.size(0)
|
||||
# measure accuracy and record loss
|
||||
losses.update(loss.item(), num_images)
|
||||
_, avg_acc, cnt, pred = accuracy(output.cpu().numpy(),
|
||||
target.cpu().numpy())
|
||||
|
||||
acc.update(avg_acc, cnt)
|
||||
|
||||
# measure elapsed time
|
||||
batch_time.update(time.time() - end)
|
||||
end = time.time()
|
||||
|
||||
c = meta['center'].numpy()
|
||||
s = meta['scale'].numpy()
|
||||
score = meta['score'].numpy()
|
||||
|
||||
preds, maxvals = get_final_preds(
|
||||
config, output.clone().cpu().numpy(), c, s)
|
||||
|
||||
all_preds[idx:idx + num_images, :, 0:2] = preds[:, :, 0:2]
|
||||
all_preds[idx:idx + num_images, :, 2:3] = maxvals
|
||||
# double check this all_boxes parts
|
||||
all_boxes[idx:idx + num_images, 0:2] = c[:, 0:2]
|
||||
all_boxes[idx:idx + num_images, 2:4] = s[:, 0:2]
|
||||
all_boxes[idx:idx + num_images, 4] = np.prod(s*200, 1)
|
||||
all_boxes[idx:idx + num_images, 5] = score
|
||||
image_path.extend(meta['image'])
|
||||
if config.DATASET.DATASET == 'posetrack':
|
||||
filenames.extend(meta['filename'])
|
||||
imgnums.extend(meta['imgnum'].numpy())
|
||||
|
||||
idx += num_images
|
||||
|
||||
if i % config.PRINT_FREQ == 0:
|
||||
msg = 'Test: [{0}/{1}]\t' \
|
||||
'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \
|
||||
'Loss {loss.val:.4f} ({loss.avg:.4f})\t' \
|
||||
'Accuracy {acc.val:.3f} ({acc.avg:.3f})'.format(
|
||||
i, len(val_loader), batch_time=batch_time,
|
||||
loss=losses, acc=acc)
|
||||
logger.info(msg)
|
||||
|
||||
prefix = '{}_{}'.format(os.path.join(output_dir, 'val'), i)
|
||||
save_debug_images(config, input, meta, target, pred*4, output,
|
||||
prefix)
|
||||
|
||||
name_values, perf_indicator = val_dataset.evaluate(
|
||||
config, all_preds, output_dir, all_boxes, image_path,
|
||||
filenames, imgnums)
|
||||
|
||||
_, full_arch_name = get_model_name(config)
|
||||
if isinstance(name_values, list):
|
||||
for name_value in name_values:
|
||||
_print_name_value(name_value, full_arch_name)
|
||||
else:
|
||||
_print_name_value(name_values, full_arch_name)
|
||||
|
||||
if writer_dict:
|
||||
writer = writer_dict['writer']
|
||||
global_steps = writer_dict['valid_global_steps']
|
||||
writer.add_scalar('valid_loss', losses.avg, global_steps)
|
||||
writer.add_scalar('valid_acc', acc.avg, global_steps)
|
||||
if isinstance(name_values, list):
|
||||
for name_value in name_values:
|
||||
writer.add_scalars('valid', dict(name_value), global_steps)
|
||||
else:
|
||||
writer.add_scalars('valid', dict(name_values), global_steps)
|
||||
writer_dict['valid_global_steps'] = global_steps + 1
|
||||
|
||||
return perf_indicator
|
||||
|
||||
|
||||
# markdown format output
|
||||
def _print_name_value(name_value, full_arch_name):
|
||||
names = name_value.keys()
|
||||
values = name_value.values()
|
||||
num_values = len(name_value)
|
||||
logger.info(
|
||||
'| Arch ' +
|
||||
' '.join(['| {}'.format(name) for name in names]) +
|
||||
' |'
|
||||
)
|
||||
logger.info('|---' * (num_values+1) + '|')
|
||||
logger.info(
|
||||
'| ' + full_arch_name + ' ' +
|
||||
' '.join(['| {:.3f}'.format(value) for value in values]) +
|
||||
' |'
|
||||
)
|
||||
|
||||
|
||||
class AverageMeter(object):
|
||||
"""Computes and stores the average and current value"""
|
||||
def __init__(self):
|
||||
self.reset()
|
||||
|
||||
def reset(self):
|
||||
self.val = 0
|
||||
self.avg = 0
|
||||
self.sum = 0
|
||||
self.count = 0
|
||||
|
||||
def update(self, val, n=1):
|
||||
self.val = val
|
||||
self.sum += val * n
|
||||
self.count += n
|
||||
self.avg = self.sum / self.count if self.count != 0 else 0
|
|
@ -0,0 +1,74 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import math
|
||||
|
||||
import numpy as np
|
||||
|
||||
from utils.transforms import transform_preds
|
||||
|
||||
|
||||
def get_max_preds(batch_heatmaps):
|
||||
'''
|
||||
get predictions from score maps
|
||||
heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
|
||||
'''
|
||||
assert isinstance(batch_heatmaps, np.ndarray), \
|
||||
'batch_heatmaps should be numpy.ndarray'
|
||||
assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim'
|
||||
|
||||
batch_size = batch_heatmaps.shape[0]
|
||||
num_joints = batch_heatmaps.shape[1]
|
||||
width = batch_heatmaps.shape[3]
|
||||
heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1))
|
||||
idx = np.argmax(heatmaps_reshaped, 2)
|
||||
maxvals = np.amax(heatmaps_reshaped, 2)
|
||||
|
||||
maxvals = maxvals.reshape((batch_size, num_joints, 1))
|
||||
idx = idx.reshape((batch_size, num_joints, 1))
|
||||
|
||||
preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
|
||||
|
||||
preds[:, :, 0] = (preds[:, :, 0]) % width
|
||||
preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
|
||||
|
||||
pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
|
||||
pred_mask = pred_mask.astype(np.float32)
|
||||
|
||||
preds *= pred_mask
|
||||
return preds, maxvals
|
||||
|
||||
|
||||
def get_final_preds(config, batch_heatmaps, center, scale):
|
||||
coords, maxvals = get_max_preds(batch_heatmaps)
|
||||
|
||||
heatmap_height = batch_heatmaps.shape[2]
|
||||
heatmap_width = batch_heatmaps.shape[3]
|
||||
|
||||
# post-processing
|
||||
if config.TEST.POST_PROCESS:
|
||||
for n in range(coords.shape[0]):
|
||||
for p in range(coords.shape[1]):
|
||||
hm = batch_heatmaps[n][p]
|
||||
px = int(math.floor(coords[n][p][0] + 0.5))
|
||||
py = int(math.floor(coords[n][p][1] + 0.5))
|
||||
if 1 < px < heatmap_width-1 and 1 < py < heatmap_height-1:
|
||||
diff = np.array([hm[py][px+1] - hm[py][px-1],
|
||||
hm[py+1][px]-hm[py-1][px]])
|
||||
coords[n][p] += np.sign(diff) * .25
|
||||
|
||||
preds = coords.copy()
|
||||
|
||||
# Transform back
|
||||
for i in range(coords.shape[0]):
|
||||
preds[i] = transform_preds(coords[i], center[i], scale[i],
|
||||
[heatmap_width, heatmap_height])
|
||||
|
||||
return preds, maxvals
|
|
@ -0,0 +1,38 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import torch.nn as nn
|
||||
|
||||
|
||||
class JointsMSELoss(nn.Module):
|
||||
def __init__(self, use_target_weight):
|
||||
super(JointsMSELoss, self).__init__()
|
||||
self.criterion = nn.MSELoss(size_average=True)
|
||||
self.use_target_weight = use_target_weight
|
||||
|
||||
def forward(self, output, target, target_weight):
|
||||
batch_size = output.size(0)
|
||||
num_joints = output.size(1)
|
||||
heatmaps_pred = output.reshape((batch_size, num_joints, -1)).split(1, 1)
|
||||
heatmaps_gt = target.reshape((batch_size, num_joints, -1)).split(1, 1)
|
||||
loss = 0
|
||||
|
||||
for idx in range(num_joints):
|
||||
heatmap_pred = heatmaps_pred[idx].squeeze()
|
||||
heatmap_gt = heatmaps_gt[idx].squeeze()
|
||||
if self.use_target_weight:
|
||||
loss += 0.5 * self.criterion(
|
||||
heatmap_pred.mul(target_weight[:, idx]),
|
||||
heatmap_gt.mul(target_weight[:, idx])
|
||||
)
|
||||
else:
|
||||
loss += 0.5 * self.criterion(heatmap_pred, heatmap_gt)
|
||||
|
||||
return loss / num_joints
|
|
@ -0,0 +1,218 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import copy
|
||||
import logging
|
||||
import random
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
import torch
|
||||
from torch.utils.data import Dataset
|
||||
|
||||
from utils.transforms import get_affine_transform
|
||||
from utils.transforms import affine_transform
|
||||
from utils.transforms import fliplr_joints
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class JointsDataset(Dataset):
|
||||
def __init__(self, cfg, root, image_set, is_train, transform=None):
|
||||
self.num_joints = 0
|
||||
self.pixel_std = 200
|
||||
self.flip_pairs = []
|
||||
self.parent_ids = []
|
||||
|
||||
self.is_train = is_train
|
||||
self.root = root
|
||||
self.image_set = image_set
|
||||
|
||||
self.output_path = cfg.OUTPUT_DIR
|
||||
self.data_format = cfg.DATASET.DATA_FORMAT
|
||||
|
||||
self.scale_factor = cfg.DATASET.SCALE_FACTOR
|
||||
self.rotation_factor = cfg.DATASET.ROT_FACTOR
|
||||
self.flip = cfg.DATASET.FLIP
|
||||
|
||||
self.image_size = cfg.MODEL.IMAGE_SIZE
|
||||
self.target_type = cfg.MODEL.EXTRA.TARGET_TYPE
|
||||
self.heatmap_size = cfg.MODEL.EXTRA.HEATMAP_SIZE
|
||||
self.sigma = cfg.MODEL.EXTRA.SIGMA
|
||||
|
||||
self.transform = transform
|
||||
self.db = []
|
||||
|
||||
def _get_db(self):
|
||||
raise NotImplementedError
|
||||
|
||||
def evaluate(self, cfg, preds, output_dir, *args, **kwargs):
|
||||
raise NotImplementedError
|
||||
|
||||
def __len__(self,):
|
||||
return len(self.db)
|
||||
|
||||
def __getitem__(self, idx):
|
||||
db_rec = copy.deepcopy(self.db[idx])
|
||||
|
||||
image_file = db_rec['image']
|
||||
filename = db_rec['filename'] if 'filename' in db_rec else ''
|
||||
imgnum = db_rec['imgnum'] if 'imgnum' in db_rec else ''
|
||||
|
||||
if self.data_format == 'zip':
|
||||
from utils import zipreader
|
||||
data_numpy = zipreader.imread(
|
||||
image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
|
||||
else:
|
||||
data_numpy = cv2.imread(
|
||||
image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
|
||||
|
||||
joints = db_rec['joints_3d']
|
||||
joints_vis = db_rec['joints_3d_vis']
|
||||
|
||||
c = db_rec['center']
|
||||
s = db_rec['scale']
|
||||
score = db_rec['score'] if 'score' in db_rec else 1
|
||||
r = 0
|
||||
|
||||
if self.is_train:
|
||||
sf = self.scale_factor
|
||||
rf = self.rotation_factor
|
||||
s = s * np.clip(np.random.randn()*sf + 1, 1 - sf, 1 + sf)
|
||||
r = np.clip(np.random.randn()*rf, -rf*2, rf*2) \
|
||||
if random.random() <= 0.6 else 0
|
||||
|
||||
if self.flip and random.random() <= 0.5:
|
||||
data_numpy = data_numpy[:, ::-1, :]
|
||||
joints, joints_vis = fliplr_joints(
|
||||
joints, joints_vis, data_numpy.shape[1], self.flip_pairs)
|
||||
c[0] = data_numpy.shape[1] - c[0] - 1
|
||||
|
||||
trans = get_affine_transform(c, s, r, self.image_size)
|
||||
input = cv2.warpAffine(
|
||||
data_numpy,
|
||||
trans,
|
||||
(int(self.image_size[0]), int(self.image_size[1])),
|
||||
flags=cv2.INTER_LINEAR)
|
||||
|
||||
if self.transform:
|
||||
input = self.transform(input)
|
||||
|
||||
for i in range(self.num_joints):
|
||||
if joints_vis[i, 0] > 0.0:
|
||||
joints[i, 0:2] = affine_transform(joints[i, 0:2], trans)
|
||||
|
||||
target, target_weight = self.generate_target(joints, joints_vis)
|
||||
|
||||
target = torch.from_numpy(target)
|
||||
target_weight = torch.from_numpy(target_weight)
|
||||
|
||||
meta = {
|
||||
'image': image_file,
|
||||
'filename': filename,
|
||||
'imgnum': imgnum,
|
||||
'joints': joints,
|
||||
'joints_vis': joints_vis,
|
||||
'center': c,
|
||||
'scale': s,
|
||||
'rotation': r,
|
||||
'score': score
|
||||
}
|
||||
|
||||
return input, target, target_weight, meta
|
||||
|
||||
def select_data(self, db):
|
||||
db_selected = []
|
||||
for rec in db:
|
||||
num_vis = 0
|
||||
joints_x = 0.0
|
||||
joints_y = 0.0
|
||||
for joint, joint_vis in zip(
|
||||
rec['joints_3d'], rec['joints_3d_vis']):
|
||||
if joint_vis[0] <= 0:
|
||||
continue
|
||||
num_vis += 1
|
||||
|
||||
joints_x += joint[0]
|
||||
joints_y += joint[1]
|
||||
if num_vis == 0:
|
||||
continue
|
||||
|
||||
joints_x, joints_y = joints_x / num_vis, joints_y / num_vis
|
||||
|
||||
area = rec['scale'][0] * rec['scale'][1] * (self.pixel_std**2)
|
||||
joints_center = np.array([joints_x, joints_y])
|
||||
bbox_center = np.array(rec['center'])
|
||||
diff_norm2 = np.linalg.norm((joints_center-bbox_center), 2)
|
||||
ks = np.exp(-1.0*(diff_norm2**2) / ((0.2)**2*2.0*area))
|
||||
|
||||
metric = (0.2 / 16) * num_vis + 0.45 - 0.2 / 16
|
||||
if ks > metric:
|
||||
db_selected.append(rec)
|
||||
|
||||
logger.info('=> num db: {}'.format(len(db)))
|
||||
logger.info('=> num selected db: {}'.format(len(db_selected)))
|
||||
return db_selected
|
||||
|
||||
def generate_target(self, joints, joints_vis):
|
||||
'''
|
||||
:param joints: [num_joints, 3]
|
||||
:param joints_vis: [num_joints, 3]
|
||||
:return: target, target_weight(1: visible, 0: invisible)
|
||||
'''
|
||||
target_weight = np.ones((self.num_joints, 1), dtype=np.float32)
|
||||
target_weight[:, 0] = joints_vis[:, 0]
|
||||
|
||||
assert self.target_type == 'gaussian', \
|
||||
'Only support gaussian map now!'
|
||||
|
||||
if self.target_type == 'gaussian':
|
||||
target = np.zeros((self.num_joints,
|
||||
self.heatmap_size[1],
|
||||
self.heatmap_size[0]),
|
||||
dtype=np.float32)
|
||||
|
||||
tmp_size = self.sigma * 3
|
||||
|
||||
for joint_id in range(self.num_joints):
|
||||
feat_stride = self.image_size / self.heatmap_size
|
||||
mu_x = int(joints[joint_id][0] / feat_stride[0] + 0.5)
|
||||
mu_y = int(joints[joint_id][1] / feat_stride[1] + 0.5)
|
||||
# Check that any part of the gaussian is in-bounds
|
||||
ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)]
|
||||
br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)]
|
||||
if ul[0] >= self.heatmap_size[0] or ul[1] >= self.heatmap_size[1] \
|
||||
or br[0] < 0 or br[1] < 0:
|
||||
# If not, just return the image as is
|
||||
target_weight[joint_id] = 0
|
||||
continue
|
||||
|
||||
# # Generate gaussian
|
||||
size = 2 * tmp_size + 1
|
||||
x = np.arange(0, size, 1, np.float32)
|
||||
y = x[:, np.newaxis]
|
||||
x0 = y0 = size // 2
|
||||
# The gaussian is not normalized, we want the center value to equal 1
|
||||
g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * self.sigma ** 2))
|
||||
|
||||
# Usable gaussian range
|
||||
g_x = max(0, -ul[0]), min(br[0], self.heatmap_size[0]) - ul[0]
|
||||
g_y = max(0, -ul[1]), min(br[1], self.heatmap_size[1]) - ul[1]
|
||||
# Image range
|
||||
img_x = max(0, ul[0]), min(br[0], self.heatmap_size[0])
|
||||
img_y = max(0, ul[1]), min(br[1], self.heatmap_size[1])
|
||||
|
||||
v = target_weight[joint_id]
|
||||
if v > 0.5:
|
||||
target[joint_id][img_y[0]:img_y[1], img_x[0]:img_x[1]] = \
|
||||
g[g_y[0]:g_y[1], g_x[0]:g_x[1]]
|
||||
|
||||
return target, target_weight
|
|
@ -0,0 +1,12 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
from .mpii import MPIIDataset as mpii
|
||||
from .coco import COCODataset as coco
|
|
@ -0,0 +1,404 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import logging
|
||||
import os
|
||||
import pickle
|
||||
from collections import defaultdict
|
||||
from collections import OrderedDict
|
||||
|
||||
import json_tricks as json
|
||||
import numpy as np
|
||||
from pycocotools.coco import COCO
|
||||
from pycocotools.cocoeval import COCOeval
|
||||
|
||||
from dataset.JointsDataset import JointsDataset
|
||||
from nms.nms import oks_nms
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class COCODataset(JointsDataset):
|
||||
'''
|
||||
"keypoints": {
|
||||
0: "nose",
|
||||
1: "left_eye",
|
||||
2: "right_eye",
|
||||
3: "left_ear",
|
||||
4: "right_ear",
|
||||
5: "left_shoulder",
|
||||
6: "right_shoulder",
|
||||
7: "left_elbow",
|
||||
8: "right_elbow",
|
||||
9: "left_wrist",
|
||||
10: "right_wrist",
|
||||
11: "left_hip",
|
||||
12: "right_hip",
|
||||
13: "left_knee",
|
||||
14: "right_knee",
|
||||
15: "left_ankle",
|
||||
16: "right_ankle"
|
||||
},
|
||||
"skeleton": [
|
||||
[16,14],[14,12],[17,15],[15,13],[12,13],[6,12],[7,13], [6,7],[6,8],
|
||||
[7,9],[8,10],[9,11],[2,3],[1,2],[1,3],[2,4],[3,5],[4,6],[5,7]]
|
||||
'''
|
||||
def __init__(self, cfg, root, image_set, is_train, transform=None):
|
||||
super().__init__(cfg, root, image_set, is_train, transform)
|
||||
self.nms_thre = cfg.TEST.NMS_THRE
|
||||
self.image_thre = cfg.TEST.IMAGE_THRE
|
||||
self.oks_thre = cfg.TEST.OKS_THRE
|
||||
self.in_vis_thre = cfg.TEST.IN_VIS_THRE
|
||||
self.bbox_file = cfg.TEST.COCO_BBOX_FILE
|
||||
self.use_gt_bbox = cfg.TEST.USE_GT_BBOX
|
||||
self.image_width = cfg.MODEL.IMAGE_SIZE[0]
|
||||
self.image_height = cfg.MODEL.IMAGE_SIZE[1]
|
||||
self.aspect_ratio = self.image_width * 1.0 / self.image_height
|
||||
self.pixel_std = 200
|
||||
self.coco = COCO(self._get_ann_file_keypoint())
|
||||
|
||||
# deal with class names
|
||||
cats = [cat['name']
|
||||
for cat in self.coco.loadCats(self.coco.getCatIds())]
|
||||
self.classes = ['__background__'] + cats
|
||||
logger.info('=> classes: {}'.format(self.classes))
|
||||
self.num_classes = len(self.classes)
|
||||
self._class_to_ind = dict(zip(self.classes, range(self.num_classes)))
|
||||
self._class_to_coco_ind = dict(zip(cats, self.coco.getCatIds()))
|
||||
self._coco_ind_to_class_ind = dict([(self._class_to_coco_ind[cls],
|
||||
self._class_to_ind[cls])
|
||||
for cls in self.classes[1:]])
|
||||
|
||||
# load image file names
|
||||
self.image_set_index = self._load_image_set_index()
|
||||
self.num_images = len(self.image_set_index)
|
||||
logger.info('=> num_images: {}'.format(self.num_images))
|
||||
|
||||
self.num_joints = 17
|
||||
self.flip_pairs = [[1, 2], [3, 4], [5, 6], [7, 8],
|
||||
[9, 10], [11, 12], [13, 14], [15, 16]]
|
||||
self.parent_ids = None
|
||||
|
||||
self.db = self._get_db()
|
||||
|
||||
if is_train and cfg.DATASET.SELECT_DATA:
|
||||
self.db = self.select_data(self.db)
|
||||
|
||||
logger.info('=> load {} samples'.format(len(self.db)))
|
||||
|
||||
def _get_ann_file_keypoint(self):
|
||||
""" self.root / annotations / person_keypoints_train2017.json """
|
||||
prefix = 'person_keypoints' \
|
||||
if 'test' not in self.image_set else 'image_info'
|
||||
return os.path.join(self.root, 'annotations',
|
||||
prefix + '_' + self.image_set + '.json')
|
||||
|
||||
def _load_image_set_index(self):
|
||||
""" image id: int """
|
||||
image_ids = self.coco.getImgIds()
|
||||
return image_ids
|
||||
|
||||
def _get_db(self):
|
||||
if self.is_train or self.use_gt_bbox:
|
||||
# use ground truth bbox
|
||||
gt_db = self._load_coco_keypoint_annotations()
|
||||
else:
|
||||
# use bbox from detection
|
||||
gt_db = self._load_coco_person_detection_results()
|
||||
return gt_db
|
||||
|
||||
def _load_coco_keypoint_annotations(self):
|
||||
""" ground truth bbox and keypoints """
|
||||
gt_db = []
|
||||
for index in self.image_set_index:
|
||||
gt_db.extend(self._load_coco_keypoint_annotation_kernal(index))
|
||||
return gt_db
|
||||
|
||||
def _load_coco_keypoint_annotation_kernal(self, index):
|
||||
"""
|
||||
coco ann: [u'segmentation', u'area', u'iscrowd', u'image_id', u'bbox', u'category_id', u'id']
|
||||
iscrowd:
|
||||
crowd instances are handled by marking their overlaps with all categories to -1
|
||||
and later excluded in training
|
||||
bbox:
|
||||
[x1, y1, w, h]
|
||||
:param index: coco image id
|
||||
:return: db entry
|
||||
"""
|
||||
im_ann = self.coco.loadImgs(index)[0]
|
||||
width = im_ann['width']
|
||||
height = im_ann['height']
|
||||
|
||||
annIds = self.coco.getAnnIds(imgIds=index, iscrowd=False)
|
||||
objs = self.coco.loadAnns(annIds)
|
||||
|
||||
# sanitize bboxes
|
||||
valid_objs = []
|
||||
for obj in objs:
|
||||
x, y, w, h = obj['bbox']
|
||||
x1 = np.max((0, x))
|
||||
y1 = np.max((0, y))
|
||||
x2 = np.min((width - 1, x1 + np.max((0, w - 1))))
|
||||
y2 = np.min((height - 1, y1 + np.max((0, h - 1))))
|
||||
if obj['area'] > 0 and x2 >= x1 and y2 >= y1:
|
||||
# obj['clean_bbox'] = [x1, y1, x2, y2]
|
||||
obj['clean_bbox'] = [x1, y1, x2-x1, y2-y1]
|
||||
valid_objs.append(obj)
|
||||
objs = valid_objs
|
||||
|
||||
rec = []
|
||||
for obj in objs:
|
||||
cls = self._coco_ind_to_class_ind[obj['category_id']]
|
||||
if cls != 1:
|
||||
continue
|
||||
|
||||
# ignore objs without keypoints annotation
|
||||
if max(obj['keypoints']) == 0:
|
||||
continue
|
||||
|
||||
joints_3d = np.zeros((self.num_joints, 3), dtype=np.float)
|
||||
joints_3d_vis = np.zeros((self.num_joints, 3), dtype=np.float)
|
||||
for ipt in range(self.num_joints):
|
||||
joints_3d[ipt, 0] = obj['keypoints'][ipt * 3 + 0]
|
||||
joints_3d[ipt, 1] = obj['keypoints'][ipt * 3 + 1]
|
||||
joints_3d[ipt, 2] = 0
|
||||
t_vis = obj['keypoints'][ipt * 3 + 2]
|
||||
if t_vis > 1:
|
||||
t_vis = 1
|
||||
joints_3d_vis[ipt, 0] = t_vis
|
||||
joints_3d_vis[ipt, 1] = t_vis
|
||||
joints_3d_vis[ipt, 2] = 0
|
||||
|
||||
center, scale = self._box2cs(obj['clean_bbox'][:4])
|
||||
rec.append({
|
||||
'image': self.image_path_from_index(index),
|
||||
'center': center,
|
||||
'scale': scale,
|
||||
'joints_3d': joints_3d,
|
||||
'joints_3d_vis': joints_3d_vis,
|
||||
'filename': '',
|
||||
'imgnum': 0,
|
||||
})
|
||||
|
||||
return rec
|
||||
|
||||
def _box2cs(self, box):
|
||||
x, y, w, h = box[:4]
|
||||
return self._xywh2cs(x, y, w, h)
|
||||
|
||||
def _xywh2cs(self, x, y, w, h):
|
||||
center = np.zeros((2), dtype=np.float32)
|
||||
center[0] = x + w * 0.5
|
||||
center[1] = y + h * 0.5
|
||||
|
||||
if w > self.aspect_ratio * h:
|
||||
h = w * 1.0 / self.aspect_ratio
|
||||
elif w < self.aspect_ratio * h:
|
||||
w = h * self.aspect_ratio
|
||||
scale = np.array(
|
||||
[w * 1.0 / self.pixel_std, h * 1.0 / self.pixel_std],
|
||||
dtype=np.float32)
|
||||
if center[0] != -1:
|
||||
scale = scale * 1.25
|
||||
|
||||
return center, scale
|
||||
|
||||
def image_path_from_index(self, index):
|
||||
""" example: images / train2017 / 000000119993.jpg """
|
||||
file_name = '%012d.jpg' % index
|
||||
if '2014' in self.image_set:
|
||||
file_name = 'COCO_%s_' % self.image_set + file_name
|
||||
|
||||
prefix = 'test2017' if 'test' in self.image_set else self.image_set
|
||||
|
||||
data_name = prefix + '.zip@' if self.data_format == 'zip' else prefix
|
||||
|
||||
image_path = os.path.join(
|
||||
self.root, 'images', data_name, file_name)
|
||||
|
||||
return image_path
|
||||
|
||||
def _load_coco_person_detection_results(self):
|
||||
all_boxes = None
|
||||
with open(self.bbox_file, 'r') as f:
|
||||
all_boxes = json.load(f)
|
||||
|
||||
if not all_boxes:
|
||||
logger.error('=> Load %s fail!' % self.bbox_file)
|
||||
return None
|
||||
|
||||
logger.info('=> Total boxes: {}'.format(len(all_boxes)))
|
||||
|
||||
kpt_db = []
|
||||
num_boxes = 0
|
||||
for n_img in range(0, len(all_boxes)):
|
||||
det_res = all_boxes[n_img]
|
||||
if det_res['category_id'] != 1:
|
||||
continue
|
||||
img_name = self.image_path_from_index(det_res['image_id'])
|
||||
box = det_res['bbox']
|
||||
score = det_res['score']
|
||||
|
||||
if score < self.image_thre:
|
||||
continue
|
||||
|
||||
num_boxes = num_boxes + 1
|
||||
|
||||
center, scale = self._box2cs(box)
|
||||
joints_3d = np.zeros((self.num_joints, 3), dtype=np.float)
|
||||
joints_3d_vis = np.ones(
|
||||
(self.num_joints, 3), dtype=np.float)
|
||||
kpt_db.append({
|
||||
'image': img_name,
|
||||
'center': center,
|
||||
'scale': scale,
|
||||
'score': score,
|
||||
'joints_3d': joints_3d,
|
||||
'joints_3d_vis': joints_3d_vis,
|
||||
})
|
||||
|
||||
logger.info('=> Total boxes after fliter low score@{}: {}'.format(
|
||||
self.image_thre, num_boxes))
|
||||
return kpt_db
|
||||
|
||||
# need double check this API and classes field
|
||||
def evaluate(self, cfg, preds, output_dir, all_boxes, img_path,
|
||||
*args, **kwargs):
|
||||
res_folder = os.path.join(output_dir, 'results')
|
||||
if not os.path.exists(res_folder):
|
||||
os.makedirs(res_folder)
|
||||
res_file = os.path.join(
|
||||
res_folder, 'keypoints_%s_results.json' % self.image_set)
|
||||
|
||||
# person x (keypoints)
|
||||
_kpts = []
|
||||
for idx, kpt in enumerate(preds):
|
||||
_kpts.append({
|
||||
'keypoints': kpt,
|
||||
'center': all_boxes[idx][0:2],
|
||||
'scale': all_boxes[idx][2:4],
|
||||
'area': all_boxes[idx][4],
|
||||
'score': all_boxes[idx][5],
|
||||
'image': int(img_path[idx][-16:-4])
|
||||
})
|
||||
# image x person x (keypoints)
|
||||
kpts = defaultdict(list)
|
||||
for kpt in _kpts:
|
||||
kpts[kpt['image']].append(kpt)
|
||||
|
||||
# rescoring and oks nms
|
||||
num_joints = self.num_joints
|
||||
in_vis_thre = self.in_vis_thre
|
||||
oks_thre = self.oks_thre
|
||||
oks_nmsed_kpts = []
|
||||
for img in kpts.keys():
|
||||
img_kpts = kpts[img]
|
||||
for n_p in img_kpts:
|
||||
box_score = n_p['score']
|
||||
kpt_score = 0
|
||||
valid_num = 0
|
||||
for n_jt in range(0, num_joints):
|
||||
t_s = n_p['keypoints'][n_jt][2]
|
||||
if t_s > in_vis_thre:
|
||||
kpt_score = kpt_score + t_s
|
||||
valid_num = valid_num + 1
|
||||
if valid_num != 0:
|
||||
kpt_score = kpt_score / valid_num
|
||||
# rescoring
|
||||
n_p['score'] = kpt_score * box_score
|
||||
keep = oks_nms([img_kpts[i] for i in range(len(img_kpts))],
|
||||
oks_thre)
|
||||
if len(keep) == 0:
|
||||
oks_nmsed_kpts.append(img_kpts)
|
||||
else:
|
||||
oks_nmsed_kpts.append([img_kpts[_keep] for _keep in keep])
|
||||
|
||||
self._write_coco_keypoint_results(
|
||||
oks_nmsed_kpts, res_file)
|
||||
if 'test' not in self.image_set:
|
||||
info_str = self._do_python_keypoint_eval(
|
||||
res_file, res_folder)
|
||||
name_value = OrderedDict(info_str)
|
||||
return name_value, name_value['AP']
|
||||
else:
|
||||
return {'Null': 0}, 0
|
||||
|
||||
def _write_coco_keypoint_results(self, keypoints, res_file):
|
||||
data_pack = [{'cat_id': self._class_to_coco_ind[cls],
|
||||
'cls_ind': cls_ind,
|
||||
'cls': cls,
|
||||
'ann_type': 'keypoints',
|
||||
'keypoints': keypoints
|
||||
}
|
||||
for cls_ind, cls in enumerate(self.classes) if not cls == '__background__']
|
||||
|
||||
results = self._coco_keypoint_results_one_category_kernel(data_pack[0])
|
||||
logger.info('=> Writing results json to %s' % res_file)
|
||||
with open(res_file, 'w') as f:
|
||||
json.dump(results, f, sort_keys=True, indent=4)
|
||||
try:
|
||||
json.load(open(res_file))
|
||||
except Exception:
|
||||
content = []
|
||||
with open(res_file, 'r') as f:
|
||||
for line in f:
|
||||
content.append(line)
|
||||
content[-1] = ']'
|
||||
with open(res_file, 'w') as f:
|
||||
for c in content:
|
||||
f.write(c)
|
||||
|
||||
def _coco_keypoint_results_one_category_kernel(self, data_pack):
|
||||
cat_id = data_pack['cat_id']
|
||||
keypoints = data_pack['keypoints']
|
||||
cat_results = []
|
||||
|
||||
for img_kpts in keypoints:
|
||||
if len(img_kpts) == 0:
|
||||
continue
|
||||
|
||||
_key_points = np.array([img_kpts[k]['keypoints']
|
||||
for k in range(len(img_kpts))])
|
||||
key_points = np.zeros(
|
||||
(_key_points.shape[0], self.num_joints * 3), dtype=np.float)
|
||||
|
||||
for ipt in range(self.num_joints):
|
||||
key_points[:, ipt * 3 + 0] = _key_points[:, ipt, 0]
|
||||
key_points[:, ipt * 3 + 1] = _key_points[:, ipt, 1]
|
||||
key_points[:, ipt * 3 + 2] = _key_points[:, ipt, 2] # keypoints score.
|
||||
|
||||
result = [{'image_id': img_kpts[k]['image'],
|
||||
'category_id': cat_id,
|
||||
'keypoints': list(key_points[k]),
|
||||
'score': img_kpts[k]['score'],
|
||||
'center': list(img_kpts[k]['center']),
|
||||
'scale': list(img_kpts[k]['scale'])
|
||||
} for k in range(len(img_kpts))]
|
||||
cat_results.extend(result)
|
||||
|
||||
return cat_results
|
||||
|
||||
def _do_python_keypoint_eval(self, res_file, res_folder):
|
||||
coco_dt = self.coco.loadRes(res_file)
|
||||
coco_eval = COCOeval(self.coco, coco_dt, 'keypoints')
|
||||
coco_eval.params.useSegm = None
|
||||
coco_eval.evaluate()
|
||||
coco_eval.accumulate()
|
||||
info_str = coco_eval.summarize()
|
||||
|
||||
eval_file = os.path.join(
|
||||
res_folder, 'keypoints_%s_results.pkl' % self.image_set)
|
||||
|
||||
with open(eval_file, 'wb') as f:
|
||||
pickle.dump(coco_eval, f, pickle.HIGHEST_PROTOCOL)
|
||||
logger.info('=> coco eval results saved to %s' % eval_file)
|
||||
|
||||
return info_str
|
|
@ -0,0 +1,176 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
from collections import OrderedDict
|
||||
import logging
|
||||
import os
|
||||
import json_tricks as json
|
||||
|
||||
import numpy as np
|
||||
from scipy.io import loadmat, savemat
|
||||
|
||||
from dataset.JointsDataset import JointsDataset
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class MPIIDataset(JointsDataset):
|
||||
def __init__(self, cfg, root, image_set, is_train, transform=None):
|
||||
super().__init__(cfg, root, image_set, is_train, transform)
|
||||
|
||||
self.num_joints = 16
|
||||
self.flip_pairs = [[0, 5], [1, 4], [2, 3], [10, 15], [11, 14], [12, 13]]
|
||||
self.parent_ids = [1, 2, 6, 6, 3, 4, 6, 6, 7, 8, 11, 12, 7, 7, 13, 14]
|
||||
|
||||
self.db = self._get_db()
|
||||
|
||||
if is_train and cfg.DATASET.SELECT_DATA:
|
||||
self.db = self.select_data(self.db)
|
||||
|
||||
logger.info('=> load {} samples'.format(len(self.db)))
|
||||
|
||||
def _get_db(self):
|
||||
# create train/val split
|
||||
file_name = os.path.join(self.root,
|
||||
'annot',
|
||||
self.image_set+'.json')
|
||||
with open(file_name) as anno_file:
|
||||
anno = json.load(anno_file)
|
||||
|
||||
gt_db = []
|
||||
for a in anno:
|
||||
image_name = a['image']
|
||||
|
||||
c = np.array(a['center'], dtype=np.float)
|
||||
s = np.array([a['scale'], a['scale']], dtype=np.float)
|
||||
|
||||
# Adjust center/scale slightly to avoid cropping limbs
|
||||
if c[0] != -1:
|
||||
c[1] = c[1] + 15 * s[1]
|
||||
s = s * 1.25
|
||||
|
||||
# MPII uses matlab format, index is based 1,
|
||||
# we should first convert to 0-based index
|
||||
c = c - 1
|
||||
|
||||
joints_3d = np.zeros((self.num_joints, 3), dtype=np.float)
|
||||
joints_3d_vis = np.zeros((self.num_joints, 3), dtype=np.float)
|
||||
if self.image_set != 'test':
|
||||
joints = np.array(a['joints'])
|
||||
joints[:, 0:2] = joints[:, 0:2] - 1
|
||||
joints_vis = np.array(a['joints_vis'])
|
||||
assert len(joints) == self.num_joints, \
|
||||
'joint num diff: {} vs {}'.format(len(joints),
|
||||
self.num_joints)
|
||||
|
||||
joints_3d[:, 0:2] = joints[:, 0:2]
|
||||
joints_3d_vis[:, 0] = joints_vis[:]
|
||||
joints_3d_vis[:, 1] = joints_vis[:]
|
||||
|
||||
image_dir = 'images.zip@' if self.data_format == 'zip' else 'images'
|
||||
gt_db.append({
|
||||
'image': os.path.join(self.root, image_dir, image_name),
|
||||
'center': c,
|
||||
'scale': s,
|
||||
'joints_3d': joints_3d,
|
||||
'joints_3d_vis': joints_3d_vis,
|
||||
'filename': '',
|
||||
'imgnum': 0,
|
||||
})
|
||||
|
||||
return gt_db
|
||||
|
||||
def evaluate(self, cfg, preds, output_dir, *args, **kwargs):
|
||||
# convert 0-based index to 1-based index
|
||||
preds = preds[:, :, 0:2] + 1.0
|
||||
|
||||
if output_dir:
|
||||
pred_file = os.path.join(output_dir, 'pred.mat')
|
||||
savemat(pred_file, mdict={'preds': preds})
|
||||
|
||||
if 'test' in cfg.DATASET.TEST_SET:
|
||||
return {'Null': 0.0}, 0.0
|
||||
|
||||
SC_BIAS = 0.6
|
||||
threshold = 0.5
|
||||
|
||||
gt_file = os.path.join(cfg.DATASET.ROOT,
|
||||
'annot',
|
||||
'gt_{}.mat'.format(cfg.DATASET.TEST_SET))
|
||||
gt_dict = loadmat(gt_file)
|
||||
dataset_joints = gt_dict['dataset_joints']
|
||||
jnt_missing = gt_dict['jnt_missing']
|
||||
pos_gt_src = gt_dict['pos_gt_src']
|
||||
headboxes_src = gt_dict['headboxes_src']
|
||||
|
||||
pos_pred_src = np.transpose(preds, [1, 2, 0])
|
||||
|
||||
head = np.where(dataset_joints == 'head')[1][0]
|
||||
lsho = np.where(dataset_joints == 'lsho')[1][0]
|
||||
lelb = np.where(dataset_joints == 'lelb')[1][0]
|
||||
lwri = np.where(dataset_joints == 'lwri')[1][0]
|
||||
lhip = np.where(dataset_joints == 'lhip')[1][0]
|
||||
lkne = np.where(dataset_joints == 'lkne')[1][0]
|
||||
lank = np.where(dataset_joints == 'lank')[1][0]
|
||||
|
||||
rsho = np.where(dataset_joints == 'rsho')[1][0]
|
||||
relb = np.where(dataset_joints == 'relb')[1][0]
|
||||
rwri = np.where(dataset_joints == 'rwri')[1][0]
|
||||
rkne = np.where(dataset_joints == 'rkne')[1][0]
|
||||
rank = np.where(dataset_joints == 'rank')[1][0]
|
||||
rhip = np.where(dataset_joints == 'rhip')[1][0]
|
||||
|
||||
jnt_visible = 1 - jnt_missing
|
||||
uv_error = pos_pred_src - pos_gt_src
|
||||
uv_err = np.linalg.norm(uv_error, axis=1)
|
||||
headsizes = headboxes_src[1, :, :] - headboxes_src[0, :, :]
|
||||
headsizes = np.linalg.norm(headsizes, axis=0)
|
||||
headsizes *= SC_BIAS
|
||||
scale = np.multiply(headsizes, np.ones((len(uv_err), 1)))
|
||||
scaled_uv_err = np.divide(uv_err, scale)
|
||||
scaled_uv_err = np.multiply(scaled_uv_err, jnt_visible)
|
||||
jnt_count = np.sum(jnt_visible, axis=1)
|
||||
less_than_threshold = np.multiply((scaled_uv_err <= threshold),
|
||||
jnt_visible)
|
||||
PCKh = np.divide(100.*np.sum(less_than_threshold, axis=1), jnt_count)
|
||||
|
||||
# save
|
||||
rng = np.arange(0, 0.5+0.01, 0.01)
|
||||
pckAll = np.zeros((len(rng), 16))
|
||||
|
||||
for r in range(len(rng)):
|
||||
threshold = rng[r]
|
||||
less_than_threshold = np.multiply(scaled_uv_err <= threshold,
|
||||
jnt_visible)
|
||||
pckAll[r, :] = np.divide(100.*np.sum(less_than_threshold, axis=1),
|
||||
jnt_count)
|
||||
|
||||
PCKh = np.ma.array(PCKh, mask=False)
|
||||
PCKh.mask[6:8] = True
|
||||
|
||||
jnt_count = np.ma.array(jnt_count, mask=False)
|
||||
jnt_count.mask[6:8] = True
|
||||
jnt_ratio = jnt_count / np.sum(jnt_count).astype(np.float64)
|
||||
|
||||
name_value = [
|
||||
('Head', PCKh[head]),
|
||||
('Shoulder', 0.5 * (PCKh[lsho] + PCKh[rsho])),
|
||||
('Elbow', 0.5 * (PCKh[lelb] + PCKh[relb])),
|
||||
('Wrist', 0.5 * (PCKh[lwri] + PCKh[rwri])),
|
||||
('Hip', 0.5 * (PCKh[lhip] + PCKh[rhip])),
|
||||
('Knee', 0.5 * (PCKh[lkne] + PCKh[rkne])),
|
||||
('Ankle', 0.5 * (PCKh[lank] + PCKh[rank])),
|
||||
('Mean', np.sum(PCKh * jnt_ratio)),
|
||||
('Mean@0.1', np.sum(pckAll[11, :] * jnt_ratio))
|
||||
]
|
||||
name_value = OrderedDict(name_value)
|
||||
|
||||
return name_value, name_value['Mean']
|
|
@ -0,0 +1,11 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import models.pose_resnet
|
|
@ -0,0 +1,257 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import logging
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
|
||||
BN_MOMENTUM = 0.1
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def conv3x3(in_planes, out_planes, stride=1):
|
||||
"""3x3 convolution with padding"""
|
||||
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
|
||||
padding=1, bias=False)
|
||||
|
||||
|
||||
class BasicBlock(nn.Module):
|
||||
expansion = 1
|
||||
|
||||
def __init__(self, inplanes, planes, stride=1, downsample=None):
|
||||
super(BasicBlock, self).__init__()
|
||||
self.conv1 = conv3x3(inplanes, planes, stride)
|
||||
self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
|
||||
self.relu = nn.ReLU(inplace=True)
|
||||
self.conv2 = conv3x3(planes, planes)
|
||||
self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
|
||||
self.downsample = downsample
|
||||
self.stride = stride
|
||||
|
||||
def forward(self, x):
|
||||
residual = x
|
||||
|
||||
out = self.conv1(x)
|
||||
out = self.bn1(out)
|
||||
out = self.relu(out)
|
||||
|
||||
out = self.conv2(out)
|
||||
out = self.bn2(out)
|
||||
|
||||
if self.downsample is not None:
|
||||
residual = self.downsample(x)
|
||||
|
||||
out += residual
|
||||
out = self.relu(out)
|
||||
|
||||
return out
|
||||
|
||||
|
||||
class Bottleneck(nn.Module):
|
||||
expansion = 4
|
||||
|
||||
def __init__(self, inplanes, planes, stride=1, downsample=None):
|
||||
super(Bottleneck, self).__init__()
|
||||
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
|
||||
self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
|
||||
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
|
||||
padding=1, bias=False)
|
||||
self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
|
||||
self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,
|
||||
bias=False)
|
||||
self.bn3 = nn.BatchNorm2d(planes * self.expansion,
|
||||
momentum=BN_MOMENTUM)
|
||||
self.relu = nn.ReLU(inplace=True)
|
||||
self.downsample = downsample
|
||||
self.stride = stride
|
||||
|
||||
def forward(self, x):
|
||||
residual = x
|
||||
|
||||
out = self.conv1(x)
|
||||
out = self.bn1(out)
|
||||
out = self.relu(out)
|
||||
|
||||
out = self.conv2(out)
|
||||
out = self.bn2(out)
|
||||
out = self.relu(out)
|
||||
|
||||
out = self.conv3(out)
|
||||
out = self.bn3(out)
|
||||
|
||||
if self.downsample is not None:
|
||||
residual = self.downsample(x)
|
||||
|
||||
out += residual
|
||||
out = self.relu(out)
|
||||
|
||||
return out
|
||||
|
||||
|
||||
class PoseResNet(nn.Module):
|
||||
|
||||
def __init__(self, block, layers, cfg, **kwargs):
|
||||
self.inplanes = 64
|
||||
extra = cfg.MODEL.EXTRA
|
||||
self.deconv_with_bias = extra.DECONV_WITH_BIAS
|
||||
|
||||
super(PoseResNet, self).__init__()
|
||||
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
|
||||
bias=False)
|
||||
self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
|
||||
self.relu = nn.ReLU(inplace=True)
|
||||
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
|
||||
self.layer1 = self._make_layer(block, 64, layers[0])
|
||||
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
|
||||
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
|
||||
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
|
||||
|
||||
# used for deconv layers
|
||||
self.deconv_layers = self._make_deconv_layer(
|
||||
extra.NUM_DECONV_LAYERS,
|
||||
extra.NUM_DECONV_FILTERS,
|
||||
extra.NUM_DECONV_KERNELS,
|
||||
)
|
||||
|
||||
self.final_layer = nn.Conv2d(
|
||||
in_channels=extra.NUM_DECONV_FILTERS[-1],
|
||||
out_channels=cfg.MODEL.NUM_JOINTS,
|
||||
kernel_size=extra.FINAL_CONV_KERNEL,
|
||||
stride=1,
|
||||
padding=1 if extra.FINAL_CONV_KERNEL == 3 else 0
|
||||
)
|
||||
|
||||
def _make_layer(self, block, planes, blocks, stride=1):
|
||||
downsample = None
|
||||
if stride != 1 or self.inplanes != planes * block.expansion:
|
||||
downsample = nn.Sequential(
|
||||
nn.Conv2d(self.inplanes, planes * block.expansion,
|
||||
kernel_size=1, stride=stride, bias=False),
|
||||
nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
|
||||
)
|
||||
|
||||
layers = []
|
||||
layers.append(block(self.inplanes, planes, stride, downsample))
|
||||
self.inplanes = planes * block.expansion
|
||||
for i in range(1, blocks):
|
||||
layers.append(block(self.inplanes, planes))
|
||||
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
def _get_deconv_cfg(self, deconv_kernel, index):
|
||||
if deconv_kernel == 4:
|
||||
padding = 1
|
||||
output_padding = 0
|
||||
elif deconv_kernel == 3:
|
||||
padding = 1
|
||||
output_padding = 1
|
||||
elif deconv_kernel == 2:
|
||||
padding = 0
|
||||
output_padding = 0
|
||||
|
||||
return deconv_kernel, padding, output_padding
|
||||
|
||||
def _make_deconv_layer(self, num_layers, num_filters, num_kernels):
|
||||
assert num_layers == len(num_filters), \
|
||||
'ERROR: num_deconv_layers is different len(num_deconv_filters)'
|
||||
assert num_layers == len(num_kernels), \
|
||||
'ERROR: num_deconv_layers is different len(num_deconv_filters)'
|
||||
|
||||
layers = []
|
||||
for i in range(num_layers):
|
||||
kernel, padding, output_padding = \
|
||||
self._get_deconv_cfg(num_kernels[i], i)
|
||||
|
||||
planes = num_filters[i]
|
||||
layers.append(
|
||||
nn.ConvTranspose2d(
|
||||
in_channels=self.inplanes,
|
||||
out_channels=planes,
|
||||
kernel_size=kernel,
|
||||
stride=2,
|
||||
padding=padding,
|
||||
output_padding=output_padding,
|
||||
bias=self.deconv_with_bias))
|
||||
layers.append(nn.BatchNorm2d(planes, momentum=BN_MOMENTUM))
|
||||
layers.append(nn.ReLU(inplace=True))
|
||||
self.inplanes = planes
|
||||
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
def forward(self, x):
|
||||
x = self.conv1(x)
|
||||
x = self.bn1(x)
|
||||
x = self.relu(x)
|
||||
x = self.maxpool(x)
|
||||
|
||||
x = self.layer1(x)
|
||||
x = self.layer2(x)
|
||||
x = self.layer3(x)
|
||||
x = self.layer4(x)
|
||||
|
||||
x = self.deconv_layers(x)
|
||||
x = self.final_layer(x)
|
||||
|
||||
return x
|
||||
|
||||
def init_weights(self, pretrained=''):
|
||||
if os.path.isfile(pretrained):
|
||||
logger.info('=> init deconv weights from normal distribution')
|
||||
for name, m in self.deconv_layers.named_modules():
|
||||
if isinstance(m, nn.ConvTranspose2d):
|
||||
logger.info('=> init {}.weight as normal(0, 0.001)'.format(name))
|
||||
logger.info('=> init {}.bias as 0'.format(name))
|
||||
nn.init.normal_(m.weight, std=0.001)
|
||||
if self.deconv_with_bias:
|
||||
nn.init.constant_(m.bias, 0)
|
||||
elif isinstance(m, nn.BatchNorm2d):
|
||||
logger.info('=> init {}.weight as 1'.format(name))
|
||||
logger.info('=> init {}.bias as 0'.format(name))
|
||||
nn.init.constant_(m.weight, 1)
|
||||
nn.init.constant_(m.bias, 0)
|
||||
logger.info('=> init final conv weights from normal distribution')
|
||||
for m in self.final_layer.modules():
|
||||
if isinstance(m, nn.Conv2d):
|
||||
# nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
|
||||
logger.info('=> init {}.weight as normal(0, 0.001)'.format(name))
|
||||
logger.info('=> init {}.bias as 0'.format(name))
|
||||
nn.init.normal_(m.weight, std=0.001)
|
||||
nn.init.constant_(m.bias, 0)
|
||||
|
||||
pretrained_state_dict = torch.load(pretrained)
|
||||
logger.info('=> loading pretrained model {}'.format(pretrained))
|
||||
self.load_state_dict(pretrained_state_dict, strict=False)
|
||||
else:
|
||||
logger.error('=> imagenet pretrained model dose not exist')
|
||||
logger.error('=> please download it first')
|
||||
raise ValueError('imagenet pretrained model does not exist')
|
||||
|
||||
|
||||
resnet_spec = {18: (BasicBlock, [2, 2, 2, 2]),
|
||||
34: (BasicBlock, [3, 4, 6, 3]),
|
||||
50: (Bottleneck, [3, 4, 6, 3]),
|
||||
101: (Bottleneck, [3, 4, 23, 3]),
|
||||
152: (Bottleneck, [3, 8, 36, 3])}
|
||||
|
||||
|
||||
def get_pose_net(cfg, is_train, **kwargs):
|
||||
num_layers = cfg.MODEL.EXTRA.NUM_LAYERS
|
||||
|
||||
block_class, layers = resnet_spec[num_layers]
|
||||
|
||||
model = PoseResNet(block_class, layers, cfg, **kwargs)
|
||||
|
||||
if is_train and cfg.MODEL.INIT_WEIGHTS:
|
||||
model.init_weights(cfg.MODEL.PRETRAINED)
|
||||
|
||||
return model
|
|
@ -0,0 +1,67 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Modified from py-faster-rcnn (https://github.com/rbgirshick/py-faster-rcnn)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
import numpy as np
|
||||
cimport numpy as np
|
||||
|
||||
cdef inline np.float32_t max(np.float32_t a, np.float32_t b):
|
||||
return a if a >= b else b
|
||||
|
||||
cdef inline np.float32_t min(np.float32_t a, np.float32_t b):
|
||||
return a if a <= b else b
|
||||
|
||||
def cpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh):
|
||||
cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0]
|
||||
cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1]
|
||||
cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2]
|
||||
cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3]
|
||||
cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
|
||||
|
||||
cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1)
|
||||
cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1].astype('i')
|
||||
|
||||
cdef int ndets = dets.shape[0]
|
||||
cdef np.ndarray[np.int_t, ndim=1] suppressed = \
|
||||
np.zeros((ndets), dtype=np.int)
|
||||
|
||||
# nominal indices
|
||||
cdef int _i, _j
|
||||
# sorted indices
|
||||
cdef int i, j
|
||||
# temp variables for box i's (the box currently under consideration)
|
||||
cdef np.float32_t ix1, iy1, ix2, iy2, iarea
|
||||
# variables for computing overlap with box j (lower scoring box)
|
||||
cdef np.float32_t xx1, yy1, xx2, yy2
|
||||
cdef np.float32_t w, h
|
||||
cdef np.float32_t inter, ovr
|
||||
|
||||
keep = []
|
||||
for _i in range(ndets):
|
||||
i = order[_i]
|
||||
if suppressed[i] == 1:
|
||||
continue
|
||||
keep.append(i)
|
||||
ix1 = x1[i]
|
||||
iy1 = y1[i]
|
||||
ix2 = x2[i]
|
||||
iy2 = y2[i]
|
||||
iarea = areas[i]
|
||||
for _j in range(_i + 1, ndets):
|
||||
j = order[_j]
|
||||
if suppressed[j] == 1:
|
||||
continue
|
||||
xx1 = max(ix1, x1[j])
|
||||
yy1 = max(iy1, y1[j])
|
||||
xx2 = min(ix2, x2[j])
|
||||
yy2 = min(iy2, y2[j])
|
||||
w = max(0.0, xx2 - xx1 + 1)
|
||||
h = max(0.0, yy2 - yy1 + 1)
|
||||
inter = w * h
|
||||
ovr = inter / (iarea + areas[j] - inter)
|
||||
if ovr >= thresh:
|
||||
suppressed[j] = 1
|
||||
|
||||
return keep
|
|
@ -0,0 +1,2 @@
|
|||
void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,
|
||||
int boxes_dim, float nms_overlap_thresh, int device_id);
|
|
@ -0,0 +1,30 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Modified from py-faster-rcnn (https://github.com/rbgirshick/py-faster-rcnn)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
import numpy as np
|
||||
cimport numpy as np
|
||||
|
||||
assert sizeof(int) == sizeof(np.int32_t)
|
||||
|
||||
cdef extern from "gpu_nms.hpp":
|
||||
void _nms(np.int32_t*, int*, np.float32_t*, int, int, float, int)
|
||||
|
||||
def gpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh,
|
||||
np.int32_t device_id=0):
|
||||
cdef int boxes_num = dets.shape[0]
|
||||
cdef int boxes_dim = dets.shape[1]
|
||||
cdef int num_out
|
||||
cdef np.ndarray[np.int32_t, ndim=1] \
|
||||
keep = np.zeros(boxes_num, dtype=np.int32)
|
||||
cdef np.ndarray[np.float32_t, ndim=1] \
|
||||
scores = dets[:, 4]
|
||||
cdef np.ndarray[np.int32_t, ndim=1] \
|
||||
order = scores.argsort()[::-1].astype(np.int32)
|
||||
cdef np.ndarray[np.float32_t, ndim=2] \
|
||||
sorted_dets = dets[order, :]
|
||||
_nms(&keep[0], &num_out, &sorted_dets[0, 0], boxes_num, boxes_dim, thresh, device_id)
|
||||
keep = keep[:num_out]
|
||||
return list(order[keep])
|
|
@ -0,0 +1,123 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Modified from py-faster-rcnn (https://github.com/rbgirshick/py-faster-rcnn)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import numpy as np
|
||||
|
||||
from .cpu_nms import cpu_nms
|
||||
from .gpu_nms import gpu_nms
|
||||
|
||||
|
||||
def py_nms_wrapper(thresh):
|
||||
def _nms(dets):
|
||||
return nms(dets, thresh)
|
||||
return _nms
|
||||
|
||||
|
||||
def cpu_nms_wrapper(thresh):
|
||||
def _nms(dets):
|
||||
return cpu_nms(dets, thresh)
|
||||
return _nms
|
||||
|
||||
|
||||
def gpu_nms_wrapper(thresh, device_id):
|
||||
def _nms(dets):
|
||||
return gpu_nms(dets, thresh, device_id)
|
||||
return _nms
|
||||
|
||||
|
||||
def nms(dets, thresh):
|
||||
"""
|
||||
greedily select boxes with high confidence and overlap with current maximum <= thresh
|
||||
rule out overlap >= thresh
|
||||
:param dets: [[x1, y1, x2, y2 score]]
|
||||
:param thresh: retain overlap < thresh
|
||||
:return: indexes to keep
|
||||
"""
|
||||
if dets.shape[0] == 0:
|
||||
return []
|
||||
|
||||
x1 = dets[:, 0]
|
||||
y1 = dets[:, 1]
|
||||
x2 = dets[:, 2]
|
||||
y2 = dets[:, 3]
|
||||
scores = dets[:, 4]
|
||||
|
||||
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
|
||||
order = scores.argsort()[::-1]
|
||||
|
||||
keep = []
|
||||
while order.size > 0:
|
||||
i = order[0]
|
||||
keep.append(i)
|
||||
xx1 = np.maximum(x1[i], x1[order[1:]])
|
||||
yy1 = np.maximum(y1[i], y1[order[1:]])
|
||||
xx2 = np.minimum(x2[i], x2[order[1:]])
|
||||
yy2 = np.minimum(y2[i], y2[order[1:]])
|
||||
|
||||
w = np.maximum(0.0, xx2 - xx1 + 1)
|
||||
h = np.maximum(0.0, yy2 - yy1 + 1)
|
||||
inter = w * h
|
||||
ovr = inter / (areas[i] + areas[order[1:]] - inter)
|
||||
|
||||
inds = np.where(ovr <= thresh)[0]
|
||||
order = order[inds + 1]
|
||||
|
||||
return keep
|
||||
|
||||
def oks_iou(g, d, a_g, a_d, sigmas=None, in_vis_thre=None):
|
||||
if not isinstance(sigmas, np.ndarray):
|
||||
sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89]) / 10.0
|
||||
vars = (sigmas * 2) ** 2
|
||||
xg = g[0::3]
|
||||
yg = g[1::3]
|
||||
vg = g[2::3]
|
||||
ious = np.zeros((d.shape[0]))
|
||||
for n_d in range(0, d.shape[0]):
|
||||
xd = d[n_d, 0::3]
|
||||
yd = d[n_d, 1::3]
|
||||
vd = d[n_d, 2::3]
|
||||
dx = xd - xg
|
||||
dy = yd - yg
|
||||
e = (dx ** 2 + dy ** 2) / vars / ((a_g + a_d[n_d]) / 2 + np.spacing(1)) / 2
|
||||
if in_vis_thre is not None:
|
||||
ind = list(vg > in_vis_thre) and list(vd > in_vis_thre)
|
||||
e = e[ind]
|
||||
ious[n_d] = np.sum(np.exp(-e)) / e.shape[0] if e.shape[0] != 0 else 0.0
|
||||
return ious
|
||||
|
||||
def oks_nms(kpts_db, thresh, sigmas=None, in_vis_thre=None):
|
||||
"""
|
||||
greedily select boxes with high confidence and overlap with current maximum <= thresh
|
||||
rule out overlap >= thresh, overlap = oks
|
||||
:param kpts_db
|
||||
:param thresh: retain overlap < thresh
|
||||
:return: indexes to keep
|
||||
"""
|
||||
if len(kpts_db) == 0:
|
||||
return []
|
||||
|
||||
scores = np.array([kpts_db[i]['score'] for i in range(len(kpts_db))])
|
||||
kpts = np.array([kpts_db[i]['keypoints'].flatten() for i in range(len(kpts_db))])
|
||||
areas = np.array([kpts_db[i]['area'] for i in range(len(kpts_db))])
|
||||
|
||||
order = scores.argsort()[::-1]
|
||||
|
||||
keep = []
|
||||
while order.size > 0:
|
||||
i = order[0]
|
||||
keep.append(i)
|
||||
|
||||
oks_ovr = oks_iou(kpts[i], kpts[order[1:]], areas[i], areas[order[1:]], sigmas, in_vis_thre)
|
||||
|
||||
inds = np.where(oks_ovr <= thresh)[0]
|
||||
order = order[inds + 1]
|
||||
|
||||
return keep
|
||||
|
|
@ -0,0 +1,143 @@
|
|||
// ------------------------------------------------------------------
|
||||
// Copyright (c) Microsoft
|
||||
// Licensed under The MIT License
|
||||
// Modified from MATLAB Faster R-CNN (https://github.com/shaoqingren/faster_rcnn)
|
||||
// ------------------------------------------------------------------
|
||||
|
||||
#include "gpu_nms.hpp"
|
||||
#include <vector>
|
||||
#include <iostream>
|
||||
|
||||
#define CUDA_CHECK(condition) \
|
||||
/* Code block avoids redefinition of cudaError_t error */ \
|
||||
do { \
|
||||
cudaError_t error = condition; \
|
||||
if (error != cudaSuccess) { \
|
||||
std::cout << cudaGetErrorString(error) << std::endl; \
|
||||
} \
|
||||
} while (0)
|
||||
|
||||
#define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0))
|
||||
int const threadsPerBlock = sizeof(unsigned long long) * 8;
|
||||
|
||||
__device__ inline float devIoU(float const * const a, float const * const b) {
|
||||
float left = max(a[0], b[0]), right = min(a[2], b[2]);
|
||||
float top = max(a[1], b[1]), bottom = min(a[3], b[3]);
|
||||
float width = max(right - left + 1, 0.f), height = max(bottom - top + 1, 0.f);
|
||||
float interS = width * height;
|
||||
float Sa = (a[2] - a[0] + 1) * (a[3] - a[1] + 1);
|
||||
float Sb = (b[2] - b[0] + 1) * (b[3] - b[1] + 1);
|
||||
return interS / (Sa + Sb - interS);
|
||||
}
|
||||
|
||||
__global__ void nms_kernel(const int n_boxes, const float nms_overlap_thresh,
|
||||
const float *dev_boxes, unsigned long long *dev_mask) {
|
||||
const int row_start = blockIdx.y;
|
||||
const int col_start = blockIdx.x;
|
||||
|
||||
// if (row_start > col_start) return;
|
||||
|
||||
const int row_size =
|
||||
min(n_boxes - row_start * threadsPerBlock, threadsPerBlock);
|
||||
const int col_size =
|
||||
min(n_boxes - col_start * threadsPerBlock, threadsPerBlock);
|
||||
|
||||
__shared__ float block_boxes[threadsPerBlock * 5];
|
||||
if (threadIdx.x < col_size) {
|
||||
block_boxes[threadIdx.x * 5 + 0] =
|
||||
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0];
|
||||
block_boxes[threadIdx.x * 5 + 1] =
|
||||
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1];
|
||||
block_boxes[threadIdx.x * 5 + 2] =
|
||||
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2];
|
||||
block_boxes[threadIdx.x * 5 + 3] =
|
||||
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3];
|
||||
block_boxes[threadIdx.x * 5 + 4] =
|
||||
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4];
|
||||
}
|
||||
__syncthreads();
|
||||
|
||||
if (threadIdx.x < row_size) {
|
||||
const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;
|
||||
const float *cur_box = dev_boxes + cur_box_idx * 5;
|
||||
int i = 0;
|
||||
unsigned long long t = 0;
|
||||
int start = 0;
|
||||
if (row_start == col_start) {
|
||||
start = threadIdx.x + 1;
|
||||
}
|
||||
for (i = start; i < col_size; i++) {
|
||||
if (devIoU(cur_box, block_boxes + i * 5) > nms_overlap_thresh) {
|
||||
t |= 1ULL << i;
|
||||
}
|
||||
}
|
||||
const int col_blocks = DIVUP(n_boxes, threadsPerBlock);
|
||||
dev_mask[cur_box_idx * col_blocks + col_start] = t;
|
||||
}
|
||||
}
|
||||
|
||||
void _set_device(int device_id) {
|
||||
int current_device;
|
||||
CUDA_CHECK(cudaGetDevice(¤t_device));
|
||||
if (current_device == device_id) {
|
||||
return;
|
||||
}
|
||||
// The call to cudaSetDevice must come before any calls to Get, which
|
||||
// may perform initialization using the GPU.
|
||||
CUDA_CHECK(cudaSetDevice(device_id));
|
||||
}
|
||||
|
||||
void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,
|
||||
int boxes_dim, float nms_overlap_thresh, int device_id) {
|
||||
_set_device(device_id);
|
||||
|
||||
float* boxes_dev = NULL;
|
||||
unsigned long long* mask_dev = NULL;
|
||||
|
||||
const int col_blocks = DIVUP(boxes_num, threadsPerBlock);
|
||||
|
||||
CUDA_CHECK(cudaMalloc(&boxes_dev,
|
||||
boxes_num * boxes_dim * sizeof(float)));
|
||||
CUDA_CHECK(cudaMemcpy(boxes_dev,
|
||||
boxes_host,
|
||||
boxes_num * boxes_dim * sizeof(float),
|
||||
cudaMemcpyHostToDevice));
|
||||
|
||||
CUDA_CHECK(cudaMalloc(&mask_dev,
|
||||
boxes_num * col_blocks * sizeof(unsigned long long)));
|
||||
|
||||
dim3 blocks(DIVUP(boxes_num, threadsPerBlock),
|
||||
DIVUP(boxes_num, threadsPerBlock));
|
||||
dim3 threads(threadsPerBlock);
|
||||
nms_kernel<<<blocks, threads>>>(boxes_num,
|
||||
nms_overlap_thresh,
|
||||
boxes_dev,
|
||||
mask_dev);
|
||||
|
||||
std::vector<unsigned long long> mask_host(boxes_num * col_blocks);
|
||||
CUDA_CHECK(cudaMemcpy(&mask_host[0],
|
||||
mask_dev,
|
||||
sizeof(unsigned long long) * boxes_num * col_blocks,
|
||||
cudaMemcpyDeviceToHost));
|
||||
|
||||
std::vector<unsigned long long> remv(col_blocks);
|
||||
memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks);
|
||||
|
||||
int num_to_keep = 0;
|
||||
for (int i = 0; i < boxes_num; i++) {
|
||||
int nblock = i / threadsPerBlock;
|
||||
int inblock = i % threadsPerBlock;
|
||||
|
||||
if (!(remv[nblock] & (1ULL << inblock))) {
|
||||
keep_out[num_to_keep++] = i;
|
||||
unsigned long long *p = &mask_host[0] + i * col_blocks;
|
||||
for (int j = nblock; j < col_blocks; j++) {
|
||||
remv[j] |= p[j];
|
||||
}
|
||||
}
|
||||
}
|
||||
*num_out = num_to_keep;
|
||||
|
||||
CUDA_CHECK(cudaFree(boxes_dev));
|
||||
CUDA_CHECK(cudaFree(mask_dev));
|
||||
}
|
|
@ -0,0 +1,140 @@
|
|||
# --------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under The MIT License [see LICENSE for details]
|
||||
# Modified from py-faster-rcnn (https://github.com/rbgirshick/py-faster-rcnn)
|
||||
# --------------------------------------------------------
|
||||
|
||||
import os
|
||||
from os.path import join as pjoin
|
||||
from setuptools import setup
|
||||
from distutils.extension import Extension
|
||||
from Cython.Distutils import build_ext
|
||||
import numpy as np
|
||||
|
||||
|
||||
def find_in_path(name, path):
|
||||
"Find a file in a search path"
|
||||
# Adapted fom
|
||||
# http://code.activestate.com/recipes/52224-find-a-file-given-a-search-path/
|
||||
for dir in path.split(os.pathsep):
|
||||
binpath = pjoin(dir, name)
|
||||
if os.path.exists(binpath):
|
||||
return os.path.abspath(binpath)
|
||||
return None
|
||||
|
||||
|
||||
def locate_cuda():
|
||||
"""Locate the CUDA environment on the system
|
||||
Returns a dict with keys 'home', 'nvcc', 'include', and 'lib64'
|
||||
and values giving the absolute path to each directory.
|
||||
Starts by looking for the CUDAHOME env variable. If not found, everything
|
||||
is based on finding 'nvcc' in the PATH.
|
||||
"""
|
||||
|
||||
# first check if the CUDAHOME env variable is in use
|
||||
if 'CUDAHOME' in os.environ:
|
||||
home = os.environ['CUDAHOME']
|
||||
nvcc = pjoin(home, 'bin', 'nvcc')
|
||||
else:
|
||||
# otherwise, search the PATH for NVCC
|
||||
default_path = pjoin(os.sep, 'usr', 'local', 'cuda', 'bin')
|
||||
nvcc = find_in_path('nvcc', os.environ['PATH'] + os.pathsep + default_path)
|
||||
if nvcc is None:
|
||||
raise EnvironmentError('The nvcc binary could not be '
|
||||
'located in your $PATH. Either add it to your path, or set $CUDAHOME')
|
||||
home = os.path.dirname(os.path.dirname(nvcc))
|
||||
|
||||
cudaconfig = {'home':home, 'nvcc':nvcc,
|
||||
'include': pjoin(home, 'include'),
|
||||
'lib64': pjoin(home, 'lib64')}
|
||||
for k, v in cudaconfig.items():
|
||||
if not os.path.exists(v):
|
||||
raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v))
|
||||
|
||||
return cudaconfig
|
||||
CUDA = locate_cuda()
|
||||
|
||||
|
||||
# Obtain the numpy include directory. This logic works across numpy versions.
|
||||
try:
|
||||
numpy_include = np.get_include()
|
||||
except AttributeError:
|
||||
numpy_include = np.get_numpy_include()
|
||||
|
||||
|
||||
def customize_compiler_for_nvcc(self):
|
||||
"""inject deep into distutils to customize how the dispatch
|
||||
to gcc/nvcc works.
|
||||
If you subclass UnixCCompiler, it's not trivial to get your subclass
|
||||
injected in, and still have the right customizations (i.e.
|
||||
distutils.sysconfig.customize_compiler) run on it. So instead of going
|
||||
the OO route, I have this. Note, it's kindof like a wierd functional
|
||||
subclassing going on."""
|
||||
|
||||
# tell the compiler it can processes .cu
|
||||
self.src_extensions.append('.cu')
|
||||
|
||||
# save references to the default compiler_so and _comple methods
|
||||
default_compiler_so = self.compiler_so
|
||||
super = self._compile
|
||||
|
||||
# now redefine the _compile method. This gets executed for each
|
||||
# object but distutils doesn't have the ability to change compilers
|
||||
# based on source extension: we add it.
|
||||
def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts):
|
||||
if os.path.splitext(src)[1] == '.cu':
|
||||
# use the cuda for .cu files
|
||||
self.set_executable('compiler_so', CUDA['nvcc'])
|
||||
# use only a subset of the extra_postargs, which are 1-1 translated
|
||||
# from the extra_compile_args in the Extension class
|
||||
postargs = extra_postargs['nvcc']
|
||||
else:
|
||||
postargs = extra_postargs['gcc']
|
||||
|
||||
super(obj, src, ext, cc_args, postargs, pp_opts)
|
||||
# reset the default compiler_so, which we might have changed for cuda
|
||||
self.compiler_so = default_compiler_so
|
||||
|
||||
# inject our redefined _compile method into the class
|
||||
self._compile = _compile
|
||||
|
||||
|
||||
# run the customize_compiler
|
||||
class custom_build_ext(build_ext):
|
||||
def build_extensions(self):
|
||||
customize_compiler_for_nvcc(self.compiler)
|
||||
build_ext.build_extensions(self)
|
||||
|
||||
|
||||
ext_modules = [
|
||||
Extension(
|
||||
"cpu_nms",
|
||||
["cpu_nms.pyx"],
|
||||
extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]},
|
||||
include_dirs = [numpy_include]
|
||||
),
|
||||
Extension('gpu_nms',
|
||||
['nms_kernel.cu', 'gpu_nms.pyx'],
|
||||
library_dirs=[CUDA['lib64']],
|
||||
libraries=['cudart'],
|
||||
language='c++',
|
||||
runtime_library_dirs=[CUDA['lib64']],
|
||||
# this syntax is specific to this build system
|
||||
# we're only going to use certain compiler args with nvcc and not with
|
||||
# gcc the implementation of this trick is in customize_compiler() below
|
||||
extra_compile_args={'gcc': ["-Wno-unused-function"],
|
||||
'nvcc': ['-arch=sm_35',
|
||||
'--ptxas-options=-v',
|
||||
'-c',
|
||||
'--compiler-options',
|
||||
"'-fPIC'"]},
|
||||
include_dirs = [numpy_include, CUDA['include']]
|
||||
),
|
||||
]
|
||||
|
||||
setup(
|
||||
name='nms',
|
||||
ext_modules=ext_modules,
|
||||
# inject our custom trigger
|
||||
cmdclass={'build_ext': custom_build_ext},
|
||||
)
|
|
@ -0,0 +1,123 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import numpy as np
|
||||
import cv2
|
||||
|
||||
|
||||
def flip_back(output_flipped, matched_parts):
|
||||
'''
|
||||
ouput_flipped: numpy.ndarray(batch_size, num_joints, height, width)
|
||||
'''
|
||||
assert output_flipped.ndim == 4,\
|
||||
'output_flipped should be [batch_size, num_joints, height, width]'
|
||||
|
||||
output_flipped = output_flipped[:, :, :, ::-1]
|
||||
|
||||
for pair in matched_parts:
|
||||
tmp = output_flipped[:, pair[0], :, :].copy()
|
||||
output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :]
|
||||
output_flipped[:, pair[1], :, :] = tmp
|
||||
|
||||
return output_flipped
|
||||
|
||||
|
||||
def fliplr_joints(joints, joints_vis, width, matched_parts):
|
||||
"""
|
||||
flip coords
|
||||
"""
|
||||
# Flip horizontal
|
||||
joints[:, 0] = width - joints[:, 0] - 1
|
||||
|
||||
# Change left-right parts
|
||||
for pair in matched_parts:
|
||||
joints[pair[0], :], joints[pair[1], :] = \
|
||||
joints[pair[1], :], joints[pair[0], :].copy()
|
||||
joints_vis[pair[0], :], joints_vis[pair[1], :] = \
|
||||
joints_vis[pair[1], :], joints_vis[pair[0], :].copy()
|
||||
|
||||
return joints*joints_vis, joints_vis
|
||||
|
||||
|
||||
def transform_preds(coords, center, scale, output_size):
|
||||
target_coords = np.zeros(coords.shape)
|
||||
trans = get_affine_transform(center, scale, 0, output_size, inv=1)
|
||||
for p in range(coords.shape[0]):
|
||||
target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
|
||||
return target_coords
|
||||
|
||||
|
||||
def get_affine_transform(center,
|
||||
scale,
|
||||
rot,
|
||||
output_size,
|
||||
shift=np.array([0, 0], dtype=np.float32),
|
||||
inv=0):
|
||||
if not isinstance(scale, np.ndarray) and not isinstance(scale, list):
|
||||
print(scale)
|
||||
scale = np.array([scale, scale])
|
||||
|
||||
scale_tmp = scale * 200.0
|
||||
src_w = scale_tmp[0]
|
||||
dst_w = output_size[0]
|
||||
dst_h = output_size[1]
|
||||
|
||||
rot_rad = np.pi * rot / 180
|
||||
src_dir = get_dir([0, src_w * -0.5], rot_rad)
|
||||
dst_dir = np.array([0, dst_w * -0.5], np.float32)
|
||||
|
||||
src = np.zeros((3, 2), dtype=np.float32)
|
||||
dst = np.zeros((3, 2), dtype=np.float32)
|
||||
src[0, :] = center + scale_tmp * shift
|
||||
src[1, :] = center + src_dir + scale_tmp * shift
|
||||
dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
|
||||
dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
|
||||
|
||||
src[2:, :] = get_3rd_point(src[0, :], src[1, :])
|
||||
dst[2:, :] = get_3rd_point(dst[0, :], dst[1, :])
|
||||
|
||||
if inv:
|
||||
trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
|
||||
else:
|
||||
trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
|
||||
|
||||
return trans
|
||||
|
||||
|
||||
def affine_transform(pt, t):
|
||||
new_pt = np.array([pt[0], pt[1], 1.]).T
|
||||
new_pt = np.dot(t, new_pt)
|
||||
return new_pt[:2]
|
||||
|
||||
|
||||
def get_3rd_point(a, b):
|
||||
direct = a - b
|
||||
return b + np.array([-direct[1], direct[0]], dtype=np.float32)
|
||||
|
||||
|
||||
def get_dir(src_point, rot_rad):
|
||||
sn, cs = np.sin(rot_rad), np.cos(rot_rad)
|
||||
|
||||
src_result = [0, 0]
|
||||
src_result[0] = src_point[0] * cs - src_point[1] * sn
|
||||
src_result[1] = src_point[0] * sn + src_point[1] * cs
|
||||
|
||||
return src_result
|
||||
|
||||
|
||||
def crop(img, center, scale, output_size, rot=0):
|
||||
trans = get_affine_transform(center, scale, rot, output_size)
|
||||
|
||||
dst_img = cv2.warpAffine(img,
|
||||
trans,
|
||||
(int(output_size[0]), int(output_size[1])),
|
||||
flags=cv2.INTER_LINEAR)
|
||||
|
||||
return dst_img
|
|
@ -0,0 +1,83 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import logging
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
import torch
|
||||
import torch.optim as optim
|
||||
|
||||
from core.config import get_model_name
|
||||
|
||||
|
||||
def create_logger(cfg, cfg_name, phase='train'):
|
||||
root_output_dir = Path(cfg.OUTPUT_DIR)
|
||||
# set up logger
|
||||
if not root_output_dir.exists():
|
||||
print('=> creating {}'.format(root_output_dir))
|
||||
root_output_dir.mkdir()
|
||||
|
||||
dataset = cfg.DATASET.DATASET + '_' + cfg.DATASET.HYBRID_JOINTS_TYPE \
|
||||
if cfg.DATASET.HYBRID_JOINTS_TYPE else cfg.DATASET.DATASET
|
||||
dataset = dataset.replace(':', '_')
|
||||
model, _ = get_model_name(cfg)
|
||||
cfg_name = os.path.basename(cfg_name).split('.')[0]
|
||||
|
||||
final_output_dir = root_output_dir / dataset / model / cfg_name
|
||||
|
||||
print('=> creating {}'.format(final_output_dir))
|
||||
final_output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
time_str = time.strftime('%Y-%m-%d-%H-%M')
|
||||
log_file = '{}_{}_{}.log'.format(cfg_name, time_str, phase)
|
||||
final_log_file = final_output_dir / log_file
|
||||
head = '%(asctime)-15s %(message)s'
|
||||
logging.basicConfig(filename=str(final_log_file),
|
||||
format=head)
|
||||
logger = logging.getLogger()
|
||||
logger.setLevel(logging.INFO)
|
||||
console = logging.StreamHandler()
|
||||
logging.getLogger('').addHandler(console)
|
||||
|
||||
tensorboard_log_dir = Path(cfg.LOG_DIR) / dataset / model / \
|
||||
(cfg_name + '_' + time_str)
|
||||
print('=> creating {}'.format(tensorboard_log_dir))
|
||||
tensorboard_log_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
return logger, str(final_output_dir), str(tensorboard_log_dir)
|
||||
|
||||
|
||||
def get_optimizer(cfg, model):
|
||||
optimizer = None
|
||||
if cfg.TRAIN.OPTIMIZER == 'sgd':
|
||||
optimizer = optim.SGD(
|
||||
model.parameters(),
|
||||
lr=cfg.TRAIN.LR,
|
||||
momentum=cfg.TRAIN.MOMENTUM,
|
||||
weight_decay=cfg.TRAIN.WD,
|
||||
nesterov=cfg.TRAIN.NESTEROV
|
||||
)
|
||||
elif cfg.TRAIN.OPTIMIZER == 'adam':
|
||||
optimizer = optim.Adam(
|
||||
model.parameters(),
|
||||
lr=cfg.TRAIN.LR
|
||||
)
|
||||
|
||||
return optimizer
|
||||
|
||||
|
||||
def save_checkpoint(states, is_best, output_dir,
|
||||
filename='checkpoint.pth.tar'):
|
||||
torch.save(states, os.path.join(output_dir, filename))
|
||||
if is_best and 'state_dict' in states:
|
||||
torch.save(states['state_dict'],
|
||||
os.path.join(output_dir, 'model_best.pth.tar'))
|
|
@ -0,0 +1,141 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import math
|
||||
|
||||
import numpy as np
|
||||
import torchvision
|
||||
import cv2
|
||||
|
||||
from core.inference import get_max_preds
|
||||
|
||||
|
||||
def save_batch_image_with_joints(batch_image, batch_joints, batch_joints_vis,
|
||||
file_name, nrow=8, padding=2):
|
||||
'''
|
||||
batch_image: [batch_size, channel, height, width]
|
||||
batch_joints: [batch_size, num_joints, 3],
|
||||
batch_joints_vis: [batch_size, num_joints, 1],
|
||||
}
|
||||
'''
|
||||
grid = torchvision.utils.make_grid(batch_image, nrow, padding, True)
|
||||
ndarr = grid.mul(255).clamp(0, 255).byte().permute(1, 2, 0).cpu().numpy()
|
||||
ndarr = ndarr.copy()
|
||||
|
||||
nmaps = batch_image.size(0)
|
||||
xmaps = min(nrow, nmaps)
|
||||
ymaps = int(math.ceil(float(nmaps) / xmaps))
|
||||
height = int(batch_image.size(2) + padding)
|
||||
width = int(batch_image.size(3) + padding)
|
||||
k = 0
|
||||
for y in range(ymaps):
|
||||
for x in range(xmaps):
|
||||
if k >= nmaps:
|
||||
break
|
||||
joints = batch_joints[k]
|
||||
joints_vis = batch_joints_vis[k]
|
||||
|
||||
for joint, joint_vis in zip(joints, joints_vis):
|
||||
joint[0] = x * width + padding + joint[0]
|
||||
joint[1] = y * height + padding + joint[1]
|
||||
if joint_vis[0]:
|
||||
cv2.circle(ndarr, (int(joint[0]), int(joint[1])), 2, [255, 0, 0], 2)
|
||||
k = k + 1
|
||||
cv2.imwrite(file_name, ndarr)
|
||||
|
||||
|
||||
def save_batch_heatmaps(batch_image, batch_heatmaps, file_name,
|
||||
normalize=True):
|
||||
'''
|
||||
batch_image: [batch_size, channel, height, width]
|
||||
batch_heatmaps: ['batch_size, num_joints, height, width]
|
||||
file_name: saved file name
|
||||
'''
|
||||
if normalize:
|
||||
batch_image = batch_image.clone()
|
||||
min = float(batch_image.min())
|
||||
max = float(batch_image.max())
|
||||
|
||||
batch_image.add_(-min).div_(max - min + 1e-5)
|
||||
|
||||
batch_size = batch_heatmaps.size(0)
|
||||
num_joints = batch_heatmaps.size(1)
|
||||
heatmap_height = batch_heatmaps.size(2)
|
||||
heatmap_width = batch_heatmaps.size(3)
|
||||
|
||||
grid_image = np.zeros((batch_size*heatmap_height,
|
||||
(num_joints+1)*heatmap_width,
|
||||
3),
|
||||
dtype=np.uint8)
|
||||
|
||||
preds, maxvals = get_max_preds(batch_heatmaps.detach().cpu().numpy())
|
||||
|
||||
for i in range(batch_size):
|
||||
image = batch_image[i].mul(255)\
|
||||
.clamp(0, 255)\
|
||||
.byte()\
|
||||
.permute(1, 2, 0)\
|
||||
.cpu().numpy()
|
||||
heatmaps = batch_heatmaps[i].mul(255)\
|
||||
.clamp(0, 255)\
|
||||
.byte()\
|
||||
.cpu().numpy()
|
||||
|
||||
resized_image = cv2.resize(image,
|
||||
(int(heatmap_width), int(heatmap_height)))
|
||||
|
||||
height_begin = heatmap_height * i
|
||||
height_end = heatmap_height * (i + 1)
|
||||
for j in range(num_joints):
|
||||
cv2.circle(resized_image,
|
||||
(int(preds[i][j][0]), int(preds[i][j][1])),
|
||||
1, [0, 0, 255], 1)
|
||||
heatmap = heatmaps[j, :, :]
|
||||
colored_heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
|
||||
masked_image = colored_heatmap*0.7 + resized_image*0.3
|
||||
cv2.circle(masked_image,
|
||||
(int(preds[i][j][0]), int(preds[i][j][1])),
|
||||
1, [0, 0, 255], 1)
|
||||
|
||||
width_begin = heatmap_width * (j+1)
|
||||
width_end = heatmap_width * (j+2)
|
||||
grid_image[height_begin:height_end, width_begin:width_end, :] = \
|
||||
masked_image
|
||||
# grid_image[height_begin:height_end, width_begin:width_end, :] = \
|
||||
# colored_heatmap*0.7 + resized_image*0.3
|
||||
|
||||
grid_image[height_begin:height_end, 0:heatmap_width, :] = resized_image
|
||||
|
||||
cv2.imwrite(file_name, grid_image)
|
||||
|
||||
|
||||
def save_debug_images(config, input, meta, target, joints_pred, output,
|
||||
prefix):
|
||||
if not config.DEBUG.DEBUG:
|
||||
return
|
||||
|
||||
if config.DEBUG.SAVE_BATCH_IMAGES_GT:
|
||||
save_batch_image_with_joints(
|
||||
input, meta['joints'], meta['joints_vis'],
|
||||
'{}_gt.jpg'.format(prefix)
|
||||
)
|
||||
if config.DEBUG.SAVE_BATCH_IMAGES_PRED:
|
||||
save_batch_image_with_joints(
|
||||
input, joints_pred, meta['joints_vis'],
|
||||
'{}_pred.jpg'.format(prefix)
|
||||
)
|
||||
if config.DEBUG.SAVE_HEATMAPS_GT:
|
||||
save_batch_heatmaps(
|
||||
input, target, '{}_hm_gt.jpg'.format(prefix)
|
||||
)
|
||||
if config.DEBUG.SAVE_HEATMAPS_PRED:
|
||||
save_batch_heatmaps(
|
||||
input, output, '{}_hm_pred.jpg'.format(prefix)
|
||||
)
|
|
@ -0,0 +1,70 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import zipfile
|
||||
import xml.etree.ElementTree as ET
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
_im_zfile = []
|
||||
_xml_path_zip = []
|
||||
_xml_zfile = []
|
||||
|
||||
|
||||
def imread(filename, flags=cv2.IMREAD_COLOR):
|
||||
global _im_zfile
|
||||
path = filename
|
||||
pos_at = path.index('@')
|
||||
if pos_at == -1:
|
||||
print("character '@' is not found from the given path '%s'"%(path))
|
||||
assert 0
|
||||
path_zip = path[0: pos_at]
|
||||
path_img = path[pos_at + 2:]
|
||||
if not os.path.isfile(path_zip):
|
||||
print("zip file '%s' is not found"%(path_zip))
|
||||
assert 0
|
||||
for i in range(len(_im_zfile)):
|
||||
if _im_zfile[i]['path'] == path_zip:
|
||||
data = _im_zfile[i]['zipfile'].read(path_img)
|
||||
return cv2.imdecode(np.frombuffer(data, np.uint8), flags)
|
||||
|
||||
_im_zfile.append({
|
||||
'path': path_zip,
|
||||
'zipfile': zipfile.ZipFile(path_zip, 'r')
|
||||
})
|
||||
data = _im_zfile[-1]['zipfile'].read(path_img)
|
||||
|
||||
return cv2.imdecode(np.frombuffer(data, np.uint8), flags)
|
||||
|
||||
|
||||
def xmlread(filename):
|
||||
global _xml_path_zip
|
||||
global _xml_zfile
|
||||
path = filename
|
||||
pos_at = path.index('@')
|
||||
if pos_at == -1:
|
||||
print("character '@' is not found from the given path '%s'"%(path))
|
||||
assert 0
|
||||
path_zip = path[0: pos_at]
|
||||
path_xml = path[pos_at + 2:]
|
||||
if not os.path.isfile(path_zip):
|
||||
print("zip file '%s' is not found"%(path_zip))
|
||||
assert 0
|
||||
for i in range(len(_xml_path_zip)):
|
||||
if _xml_path_zip[i] == path_zip:
|
||||
data = _xml_zfile[i].open(path_xml)
|
||||
return ET.fromstring(data.read())
|
||||
_xml_path_zip.append(path_zip)
|
||||
print("read new xml file '%s'"%(path_zip))
|
||||
_xml_zfile.append(zipfile.ZipFile(path_zip, 'r'))
|
||||
data = _xml_zfile[-1].open(path_xml)
|
||||
return ET.fromstring(data.read())
|
|
@ -0,0 +1,23 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os.path as osp
|
||||
import sys
|
||||
|
||||
|
||||
def add_path(path):
|
||||
if path not in sys.path:
|
||||
sys.path.insert(0, path)
|
||||
|
||||
|
||||
this_dir = osp.dirname(__file__)
|
||||
|
||||
lib_path = osp.join(this_dir, '..', 'lib')
|
||||
add_path(lib_path)
|
|
@ -0,0 +1,206 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import pprint
|
||||
import shutil
|
||||
|
||||
import torch
|
||||
import torch.nn.parallel
|
||||
import torch.backends.cudnn as cudnn
|
||||
import torch.optim
|
||||
import torch.utils.data
|
||||
import torch.utils.data.distributed
|
||||
import torchvision.transforms as transforms
|
||||
from tensorboardX import SummaryWriter
|
||||
|
||||
import _init_paths
|
||||
from core.config import config
|
||||
from core.config import update_config
|
||||
from core.config import update_dir
|
||||
from core.config import get_model_name
|
||||
from core.loss import JointsMSELoss
|
||||
from core.function import train
|
||||
from core.function import validate
|
||||
from utils.utils import get_optimizer
|
||||
from utils.utils import save_checkpoint
|
||||
from utils.utils import create_logger
|
||||
|
||||
import dataset
|
||||
import models
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description='Train keypoints network')
|
||||
# general
|
||||
parser.add_argument('--cfg',
|
||||
help='experiment configure file name',
|
||||
required=True,
|
||||
type=str)
|
||||
|
||||
args, rest = parser.parse_known_args()
|
||||
# update config
|
||||
update_config(args.cfg)
|
||||
|
||||
# training
|
||||
parser.add_argument('--frequent',
|
||||
help='frequency of logging',
|
||||
default=config.PRINT_FREQ,
|
||||
type=int)
|
||||
parser.add_argument('--gpus',
|
||||
help='gpus',
|
||||
type=str)
|
||||
parser.add_argument('--workers',
|
||||
help='num of dataloader workers',
|
||||
type=int)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
return args
|
||||
|
||||
|
||||
def reset_config(config, args):
|
||||
if args.gpus:
|
||||
config.GPUS = args.gpus
|
||||
if args.workers:
|
||||
config.WORKERS = args.workers
|
||||
|
||||
|
||||
def main():
|
||||
args = parse_args()
|
||||
reset_config(config, args)
|
||||
|
||||
logger, final_output_dir, tb_log_dir = create_logger(
|
||||
config, args.cfg, 'train')
|
||||
|
||||
logger.info(pprint.pformat(args))
|
||||
logger.info(pprint.pformat(config))
|
||||
|
||||
# cudnn related setting
|
||||
cudnn.benchmark = config.CUDNN.BENCHMARK
|
||||
torch.backends.cudnn.deterministic = config.CUDNN.DETERMINISTIC
|
||||
torch.backends.cudnn.enabled = config.CUDNN.ENABLED
|
||||
|
||||
model = eval('models.'+config.MODEL.NAME+'.get_pose_net')(
|
||||
config, is_train=True
|
||||
)
|
||||
|
||||
# copy model file
|
||||
this_dir = os.path.dirname(__file__)
|
||||
shutil.copy2(
|
||||
os.path.join(this_dir, '../lib/models', config.MODEL.NAME + '.py'),
|
||||
final_output_dir)
|
||||
|
||||
writer_dict = {
|
||||
'writer': SummaryWriter(log_dir=tb_log_dir),
|
||||
'train_global_steps': 0,
|
||||
'valid_global_steps': 0,
|
||||
}
|
||||
|
||||
dump_input = torch.rand((config.TRAIN.BATCH_SIZE,
|
||||
config.MODEL.NUM_JOINTS,
|
||||
config.MODEL.IMAGE_SIZE[1],
|
||||
config.MODEL.IMAGE_SIZE[0]))
|
||||
writer_dict['writer'].add_graph(model, (dump_input, ))
|
||||
|
||||
gpus = [int(i) for i in config.GPUS.split(',')]
|
||||
model = torch.nn.DataParallel(model, device_ids=gpus).cuda()
|
||||
|
||||
# define loss function (criterion) and optimizer
|
||||
criterion = JointsMSELoss(
|
||||
use_target_weight=config.LOSS.USE_TARGET_WEIGHT
|
||||
).cuda()
|
||||
|
||||
optimizer = get_optimizer(config, model)
|
||||
|
||||
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(
|
||||
optimizer, config.TRAIN.LR_STEP, config.TRAIN.LR_FACTOR
|
||||
)
|
||||
|
||||
# Data loading code
|
||||
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
|
||||
std=[0.229, 0.224, 0.225])
|
||||
train_dataset = eval('dataset.'+config.DATASET.DATASET)(
|
||||
config,
|
||||
config.DATASET.ROOT,
|
||||
config.DATASET.TRAIN_SET,
|
||||
True,
|
||||
transforms.Compose([
|
||||
transforms.ToTensor(),
|
||||
normalize,
|
||||
])
|
||||
)
|
||||
valid_dataset = eval('dataset.'+config.DATASET.DATASET)(
|
||||
config,
|
||||
config.DATASET.ROOT,
|
||||
config.DATASET.TEST_SET,
|
||||
False,
|
||||
transforms.Compose([
|
||||
transforms.ToTensor(),
|
||||
normalize,
|
||||
])
|
||||
)
|
||||
|
||||
train_loader = torch.utils.data.DataLoader(
|
||||
train_dataset,
|
||||
batch_size=config.TRAIN.BATCH_SIZE*len(gpus),
|
||||
shuffle=config.TRAIN.SHUFFLE,
|
||||
num_workers=config.WORKERS,
|
||||
pin_memory=True
|
||||
)
|
||||
valid_loader = torch.utils.data.DataLoader(
|
||||
valid_dataset,
|
||||
batch_size=config.TEST.BATCH_SIZE*len(gpus),
|
||||
shuffle=False,
|
||||
num_workers=config.WORKERS,
|
||||
pin_memory=True
|
||||
)
|
||||
|
||||
best_perf = 0.0
|
||||
best_model = False
|
||||
for epoch in range(config.TRAIN.BEGIN_EPOCH, config.TRAIN.END_EPOCH):
|
||||
lr_scheduler.step()
|
||||
|
||||
# train for one epoch
|
||||
train(config, train_loader, model, criterion, optimizer, epoch,
|
||||
final_output_dir, tb_log_dir, writer_dict)
|
||||
|
||||
|
||||
# evaluate on validation set
|
||||
perf_indicator = validate(config, valid_loader, valid_dataset, model,
|
||||
criterion, final_output_dir, tb_log_dir,
|
||||
writer_dict)
|
||||
|
||||
if perf_indicator > best_perf:
|
||||
best_perf = perf_indicator
|
||||
best_model = True
|
||||
else:
|
||||
best_model = False
|
||||
|
||||
logger.info('=> saving checkpoint to {}'.format(final_output_dir))
|
||||
save_checkpoint({
|
||||
'epoch': epoch + 1,
|
||||
'model': get_model_name(config),
|
||||
'state_dict': model.state_dict(),
|
||||
'perf': perf_indicator,
|
||||
'optimizer': optimizer.state_dict(),
|
||||
}, best_model, final_output_dir)
|
||||
|
||||
final_model_state_file = os.path.join(final_output_dir,
|
||||
'final_state.pth.tar')
|
||||
logger.info('saving final model state to {}'.format(
|
||||
final_model_state_file))
|
||||
torch.save(model.module.state_dict(), final_model_state_file)
|
||||
writer_dict['writer'].close()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
|
@ -0,0 +1,165 @@
|
|||
# ------------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import pprint
|
||||
|
||||
import torch
|
||||
import torch.nn.parallel
|
||||
import torch.backends.cudnn as cudnn
|
||||
import torch.optim
|
||||
import torch.utils.data
|
||||
import torch.utils.data.distributed
|
||||
import torchvision.transforms as transforms
|
||||
|
||||
import _init_paths
|
||||
from core.config import config
|
||||
from core.config import update_config
|
||||
from core.config import update_dir
|
||||
from core.loss import JointsMSELoss
|
||||
from core.function import validate
|
||||
from utils.utils import create_logger
|
||||
|
||||
import dataset
|
||||
import models
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description='Train keypoints network')
|
||||
# general
|
||||
parser.add_argument('--cfg',
|
||||
help='experiment configure file name',
|
||||
required=True,
|
||||
type=str)
|
||||
|
||||
args, rest = parser.parse_known_args()
|
||||
# update config
|
||||
update_config(args.cfg)
|
||||
|
||||
# training
|
||||
parser.add_argument('--frequent',
|
||||
help='frequency of logging',
|
||||
default=config.PRINT_FREQ,
|
||||
type=int)
|
||||
parser.add_argument('--gpus',
|
||||
help='gpus',
|
||||
type=str)
|
||||
parser.add_argument('--workers',
|
||||
help='num of dataloader workers',
|
||||
type=int)
|
||||
parser.add_argument('--model-file',
|
||||
help='model state file',
|
||||
type=str)
|
||||
parser.add_argument('--use-detect-bbox',
|
||||
help='use detect bbox',
|
||||
action='store_true')
|
||||
parser.add_argument('--flip-test',
|
||||
help='use flip test',
|
||||
action='store_true')
|
||||
parser.add_argument('--post-process',
|
||||
help='use post process',
|
||||
action='store_true')
|
||||
parser.add_argument('--shift-heatmap',
|
||||
help='shift heatmap',
|
||||
action='store_true')
|
||||
parser.add_argument('--coco-bbox-file',
|
||||
help='coco detection bbox file',
|
||||
type=str)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
return args
|
||||
|
||||
|
||||
def reset_config(config, args):
|
||||
if args.gpus:
|
||||
config.GPUS = args.gpus
|
||||
if args.workers:
|
||||
config.WORKERS = args.workers
|
||||
if args.use_detect_bbox:
|
||||
config.TEST.USE_GT_BBOX = not args.use_detect_bbox
|
||||
if args.flip_test:
|
||||
config.TEST.FLIP_TEST = args.flip_test
|
||||
if args.post_process:
|
||||
config.TEST.POST_PROCESS = args.post_process
|
||||
if args.shift_heatmap:
|
||||
config.TEST.SHIFT_HEATMAP = args.shift_heatmap
|
||||
if args.model_file:
|
||||
config.TEST.MODEL_FILE = args.model_file
|
||||
if args.coco_bbox_file:
|
||||
config.TEST.COCO_BBOX_FILE = args.coco_bbox_file
|
||||
|
||||
|
||||
def main():
|
||||
args = parse_args()
|
||||
reset_config(config, args)
|
||||
|
||||
logger, final_output_dir, tb_log_dir = create_logger(
|
||||
config, args.cfg, 'valid')
|
||||
|
||||
logger.info(pprint.pformat(args))
|
||||
logger.info(pprint.pformat(config))
|
||||
|
||||
# cudnn related setting
|
||||
cudnn.benchmark = config.CUDNN.BENCHMARK
|
||||
torch.backends.cudnn.deterministic = config.CUDNN.DETERMINISTIC
|
||||
torch.backends.cudnn.enabled = config.CUDNN.ENABLED
|
||||
|
||||
model = eval('models.'+config.MODEL.NAME+'.get_pose_net')(
|
||||
config, is_train=False
|
||||
)
|
||||
|
||||
if config.TEST.MODEL_FILE:
|
||||
logger.info('=> loading model from {}'.format(config.TEST.MODEL_FILE))
|
||||
model.load_state_dict(torch.load(config.TEST.MODEL_FILE))
|
||||
else:
|
||||
model_state_file = os.path.join(final_output_dir,
|
||||
'final_state.pth.tar')
|
||||
logger.info('=> loading model from {}'.format(model_state_file))
|
||||
model.load_state_dict(torch.load(model_state_file))
|
||||
|
||||
gpus = [int(i) for i in config.GPUS.split(',')]
|
||||
model = torch.nn.DataParallel(model, device_ids=gpus).cuda()
|
||||
|
||||
# define loss function (criterion) and optimizer
|
||||
criterion = JointsMSELoss(
|
||||
use_target_weight=config.LOSS.USE_TARGET_WEIGHT
|
||||
).cuda()
|
||||
|
||||
# Data loading code
|
||||
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
|
||||
std=[0.229, 0.224, 0.225])
|
||||
valid_dataset = eval('dataset.'+config.DATASET.DATASET)(
|
||||
config,
|
||||
config.DATASET.ROOT,
|
||||
config.DATASET.TEST_SET,
|
||||
False,
|
||||
transforms.Compose([
|
||||
transforms.ToTensor(),
|
||||
normalize,
|
||||
])
|
||||
)
|
||||
valid_loader = torch.utils.data.DataLoader(
|
||||
valid_dataset,
|
||||
batch_size=config.TEST.BATCH_SIZE*len(gpus),
|
||||
shuffle=False,
|
||||
num_workers=config.WORKERS,
|
||||
pin_memory=True
|
||||
)
|
||||
|
||||
# evaluate on validation set
|
||||
validate(config, valid_loader, valid_dataset, model, criterion,
|
||||
final_output_dir, tb_log_dir)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
|
@ -0,0 +1,9 @@
|
|||
EasyDict==1.7
|
||||
opencv-python==3.4.1.15
|
||||
Cython
|
||||
scipy
|
||||
pandas
|
||||
pyyaml
|
||||
json_tricks
|
||||
scikit-image
|
||||
tensorboardX
|
Загрузка…
Ссылка в новой задаче