This commit is contained in:
Bin Xiao 2018-08-02 04:00:29 +00:00
Родитель a362bfa1f2
Коммит f926dffb45
43 изменённых файлов: 4142 добавлений и 12 удалений

7
.gitignore поставляемый
Просмотреть файл

@ -14,8 +14,6 @@ dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
@ -102,3 +100,8 @@ venv.bak/
# mypy
.mypy_cache/
/data
/output
/models
/log

14
CONTRIBUTING.md Normal file
Просмотреть файл

@ -0,0 +1,14 @@
# Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

194
README.md
Просмотреть файл

@ -1,14 +1,188 @@
# Simple Baselines for Pose Estimation and Pose Tracking
# Contributing
## Introduction
This is an official pytorch implementation of [*Simple Baselines for Pose Estimation and Pose Tracking*](https://arxiv.org/abs/1804.06208). This work provides baseline methods that are surprisingly simple and effective, thus helpful for inspiring and evaluating new ideas for the field. State-of-the-art results are achieved on challenging benchmarks. On COCO keypoints valid dataset, our best **single model** achieves **74.3 of mAP**. You can reproduce our results using this repo. All models are provided for research purpose. </br>
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.microsoft.com.
## Main Results
### Results on MPII val
| arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1|
|---|---|---|---|---|---|---|---|---|---|
| 256x256_pose_resnet_50_d256d256d256 | 96.351 | 95.329 | 88.989 | 83.176 | 88.420 | 83.960 | 79.594 | 88.532 | 33.911 |
| 384x384_pose_resnet_50_d256d256d256 | 96.658 | 95.754 | 89.790 | 84.614 | 88.523 | 84.666 | 79.287 | 89.066 | 38.046 |
| 256x256_pose_resnet_101_d256d256d256 | 96.862 | 95.873 | 89.518 | 84.376 | 88.437 | 84.486 | 80.703 | 89.131 | 34.020 |
| 384x384_pose_resnet_101_d256d256d256 | 96.965 | 95.907 | 90.268 | 85.780 | 89.597 | 85.935 | 82.098 | 90.003 | 38.860 |
| 256x256_pose_resnet_152_d256d256d256 | 97.033 | 95.941 | 90.046 | 84.976 | 89.164 | 85.311 | 81.271 | 89.620 | 35.025 |
| 384x384_pose_resnet_152_d256d256d256 | 96.794 | 95.618 | 90.080 | 86.225 | 89.700 | 86.862 | 82.853 | 90.200 | 39.433 |
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
### Note:
- Flip test is used
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
### Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
| Arch | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 256x192_pose_resnet_50_d256d256d256 | 0.704 | 0.886 | 0.783 | 0.671 | 0.772 | 0.763 | 0.929 | 0.834 | 0.721 | 0.824 |
| 384x288_pose_resnet_50_d256d256d256 | 0.722 | 0.893 | 0.789 | 0.681 | 0.797 | 0.776 | 0.932 | 0.838 | 0.728 | 0.846 |
| 256x192_pose_resnet_101_d256d256d256 | 0.714 | 0.893 | 0.793 | 0.681 | 0.781 | 0.771 | 0.934 | 0.840 | 0.730 | 0.832 |
| 384x288_pose_resnet_101_d256d256d256 | 0.736 | 0.896 | 0.803 | 0.699 | 0.811 | 0.791 | 0.936 | 0.851 | 0.745 | 0.858 |
| 256x192_pose_resnet_152_d256d256d256 | 0.720 | 0.893 | 0.798 | 0.687 | 0.789 | 0.778 | 0.934 | 0.846 | 0.736 | 0.839 |
| 384x288_pose_resnet_152_d256d256d256 | 0.743 | 0.896 | 0.811 | 0.705 | 0.816 | 0.797 | 0.937 | 0.858 | 0.751 | 0.863 |
### Note:
- Flip test is used
- Person detector has person AP of 56.4 on COCO val2017 dataset
## Environment
The code is developed using python3.6 on Ubutnu16.04. NVIDIA GPUs ared needed. The code is developed and tested using 4 NVIDIA P100 GPUS cards. Other platform or GPU card are not fully tested.
## Quick start
### Installation
1. Install pytorch >= v0.4.0 following [official instruction](https://pytorch.org/)
2. Disable cudnn for batch_norm
```
# PYTORCH=/path/to/pytorch
# for pytorch v0.4.0
sed -i "1194s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
# for pytorch v0.4.1
sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
```
Note that instructions like # PYTORCH=/path/to/pytorch indicate that you should pick a path where you'd like to have pytorch installed and then set an environment variable (PYTORCH in this case) accordingly.
1. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}
2. Install dependencies.
```
pip install -r requirements.txt
```
3. Install [COCOAPI](https://github.com/cocodataset/cocoapi):
```
# COCOAPI=/path/to/clone/cocoapi
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
cd $COCOAPI/PythonAPI
# Install into global site-packages
make install
# Alternatively, if you do not have permissions or prefer
# not to install the COCO API into global site-packages
python3 setup.py install --user
```
Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.
3. Download pytorch imagenet pretrained models from [pytorch model zoo](https://pytorch.org/docs/stable/model_zoo.html#module-torch.utils.model_zoo).
4. Download mpii and coco pretrained model from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blW0D5ZE4ArK9wk_fvw). Please download them under ${POSE_ROOT}/models/pytorch, and make them look like this:
```
${POSE_ROOT}
`-- models
`-- pytorch
|-- imagenet
| |-- resnet50-19c8e357.pth
| |-- resnet101-5d3b4d8f.pth
| `-- resnet152-b121ed2d.pth
|-- pose_coco
| |-- pose_resnet_101_256x192.pth.tar
| |-- pose_resnet_101_384x288.pth.tar
| |-- pose_resnet_152_256x192.pth.tar
| |-- pose_resnet_152_384x288.pth.tar
| |-- pose_resnet_50_256x192.pth.tar
| `-- pose_resnet_50_384x288.pth.tar
`-- pose_mpii
|-- pose_resnet_101_256x256.pth.tar
|-- pose_resnet_101_384x384.pth.tar
|-- pose_resnet_152_256x256.pth.tar
|-- pose_resnet_152_384x384.pth.tar
|-- pose_resnet_50_256x256.pth.tar
`-- pose_resnet_50_384x384.pth.tar
```
4. Init output(training model output directory) and log(tensorboard log directory) directory.
```
mkdir ouput
mkdir log
```
and your directory tree should like this
```
${POSE_ROOT}
├── data
├── experiments
├── lib
├── log
├── models
├── output
├── pose_estimation
├── README.md
└── requirements.txt
```
### Data preparation
**For MPII data**, please download from [MPII Human Pose Dataset](http://human-pose.mpi-inf.mpg.de/), the original annotation files are matlab's format. We have converted to json format, you also need download them from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blW00SqrairNetmeVu4).
Extract them under {POSE_ROOT}/data, and make them look like this:
```
${POSE_ROOT}
|-- data
`-- |-- mpii
`-- |-- annot
| |-- gt_valid.mat
| |-- test.json
| |-- train.json
| |-- trainval.json
| `-- valid.json
`-- images
|-- 000001163.jpg
|-- 000003072.jpg
```
**For COCO data**, please download from [COCO download](http://cocodataset.org/#download), 2017 Train/Val is needed for COCO keypoints training and validation. We also provide person detection result of COCO val2017 for reproduce our multi-person pose estimation results. Please download from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzA1A-y1AH-pZQdS).
Download and extract them under {POSE_ROOT}/data, and make them look like this:
```
${POSE_ROOT}
|-- data
`-- |-- coco
`-- |-- annotations
| |-- person_keypoints_train2017.json
| `-- person_keypoints_val2017.json
|-- person_detection_results
| |-- COCO_val2017_detections_AP_H_56_person.json
`-- images
|-- train2017
| |-- 000000000009.jpg
| |-- 000000000025.jpg
| |-- 000000000030.jpg
| |-- ...
`-- val2017
|-- 000000000139.jpg
|-- 000000000285.jpg
|-- 000000000632.jpg
|-- ...
```
### Valid on MPII using pretrained models
```
python pose_estimation/valid.py \
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
--flip-test \
--model-file models/pytorch/pose_mpii/pose_resnet_50_256x256.pth.tar
```
### Training on MPII
```
python pose_estimation/train.py \
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml
```
### Valid on COCO val2017 using pretrained models
```
python pose_estimation/valid.py \
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
--flip-test \
--model-file models/pytorch/pose_mpii/pose_resnet_50_256x256.pth.tar
```
### Training on COCO train2017
```
python pose_estimation/train.py \
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml
```

Просмотреть файл

@ -0,0 +1,76 @@
GPUS: '0'
DATA_DIR: ''
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 100
DATASET:
DATASET: 'coco'
ROOT: 'data/coco/'
TEST_SET: 'val2017'
TRAIN_SET: 'train2017'
FLIP: true
ROT_FACTOR: 40
SCALE_FACTOR: 0.3
MODEL:
NAME: 'pose_resnet'
PRETRAINED: 'models/pytorch/imagenet/resnet101-5d3b4d8f.pth'
IMAGE_SIZE:
- 192
- 256
NUM_JOINTS: 17
EXTRA:
TARGET_TYPE: 'gaussian'
HEATMAP_SIZE:
- 48
- 64
SIGMA: 2
FINAL_CONV_KERNEL: 1
DECONV_WITH_BIAS: false
NUM_DECONV_LAYERS: 3
NUM_DECONV_FILTERS:
- 256
- 256
- 256
NUM_DECONV_KERNELS:
- 4
- 4
- 4
NUM_LAYERS: 101
LOSS:
USE_TARGET_WEIGHT: true
TRAIN:
BATCH_SIZE: 32
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 140
RESUME: false
OPTIMIZER: 'adam'
LR: 0.001
LR_FACTOR: 0.1
LR_STEP:
- 90
- 120
WD: 0.0001
GAMMA1: 0.99
GAMMA2: 0.0
MOMENTUM: 0.9
NESTEROV: false
TEST:
BATCH_SIZE: 32
COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
BBOX_THRE: 1.0
FLIP_TEST: false
IMAGE_THRE: 0.0
IN_VIS_THRE: 0.2
MODEL_FILE: ''
NMS_THRE: 1.0
OKS_THRE: 0.9
USE_GT_BBOX: true
DEBUG:
DEBUG: true
SAVE_BATCH_IMAGES_GT: true
SAVE_BATCH_IMAGES_PRED: true
SAVE_HEATMAPS_GT: true
SAVE_HEATMAPS_PRED: true

Просмотреть файл

@ -0,0 +1,76 @@
GPUS: '0'
DATA_DIR: ''
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 100
DATASET:
DATASET: 'coco'
ROOT: 'data/coco/'
TEST_SET: 'val2017'
TRAIN_SET: 'train2017'
FLIP: true
ROT_FACTOR: 40
SCALE_FACTOR: 0.3
MODEL:
NAME: 'pose_resnet'
PRETRAINED: 'models/pytorch/imagenet/resnet101-5d3b4d8f.pth'
IMAGE_SIZE:
- 288
- 384
NUM_JOINTS: 17
EXTRA:
TARGET_TYPE: 'gaussian'
HEATMAP_SIZE:
- 72
- 96
SIGMA: 3
FINAL_CONV_KERNEL: 1
DECONV_WITH_BIAS: false
NUM_DECONV_LAYERS: 3
NUM_DECONV_FILTERS:
- 256
- 256
- 256
NUM_DECONV_KERNELS:
- 4
- 4
- 4
NUM_LAYERS: 101
LOSS:
USE_TARGET_WEIGHT: true
TRAIN:
BATCH_SIZE: 32
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 140
RESUME: false
OPTIMIZER: 'adam'
LR: 0.001
LR_FACTOR: 0.1
LR_STEP:
- 90
- 120
WD: 0.0001
GAMMA1: 0.99
GAMMA2: 0.0
MOMENTUM: 0.9
NESTEROV: false
TEST:
BATCH_SIZE: 32
COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
BBOX_THRE: 1.0
FLIP_TEST: false
IMAGE_THRE: 0.0
IN_VIS_THRE: 0.2
MODEL_FILE: ''
NMS_THRE: 1.0
OKS_THRE: 0.9
USE_GT_BBOX: true
DEBUG:
DEBUG: true
SAVE_BATCH_IMAGES_GT: true
SAVE_BATCH_IMAGES_PRED: true
SAVE_HEATMAPS_GT: true
SAVE_HEATMAPS_PRED: true

Просмотреть файл

@ -0,0 +1,76 @@
GPUS: '0'
DATA_DIR: ''
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 100
DATASET:
DATASET: 'coco'
ROOT: 'data/coco/'
TEST_SET: 'val2017'
TRAIN_SET: 'train2017'
FLIP: true
ROT_FACTOR: 40
SCALE_FACTOR: 0.3
MODEL:
NAME: 'pose_resnet'
PRETRAINED: 'models/pytorch/imagenet/resnet152-b121ed2d.pth'
IMAGE_SIZE:
- 192
- 256
NUM_JOINTS: 17
EXTRA:
TARGET_TYPE: 'gaussian'
HEATMAP_SIZE:
- 48
- 64
SIGMA: 2
FINAL_CONV_KERNEL: 1
DECONV_WITH_BIAS: false
NUM_DECONV_LAYERS: 3
NUM_DECONV_FILTERS:
- 256
- 256
- 256
NUM_DECONV_KERNELS:
- 4
- 4
- 4
NUM_LAYERS: 152
LOSS:
USE_TARGET_WEIGHT: true
TRAIN:
BATCH_SIZE: 32
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 140
RESUME: false
OPTIMIZER: 'adam'
LR: 0.001
LR_FACTOR: 0.1
LR_STEP:
- 90
- 120
WD: 0.0001
GAMMA1: 0.99
GAMMA2: 0.0
MOMENTUM: 0.9
NESTEROV: false
TEST:
BATCH_SIZE: 32
COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
BBOX_THRE: 1.0
FLIP_TEST: false
IMAGE_THRE: 0.0
IN_VIS_THRE: 0.2
MODEL_FILE: ''
NMS_THRE: 1.0
OKS_THRE: 0.9
USE_GT_BBOX: true
DEBUG:
DEBUG: true
SAVE_BATCH_IMAGES_GT: true
SAVE_BATCH_IMAGES_PRED: true
SAVE_HEATMAPS_GT: true
SAVE_HEATMAPS_PRED: true

Просмотреть файл

@ -0,0 +1,76 @@
GPUS: '0'
DATA_DIR: ''
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 100
DATASET:
DATASET: 'coco'
ROOT: 'data/coco/'
TEST_SET: 'val2017'
TRAIN_SET: 'train2017'
FLIP: true
ROT_FACTOR: 40
SCALE_FACTOR: 0.3
MODEL:
NAME: 'pose_resnet'
PRETRAINED: 'models/pytorch/imagenet/resnet152-b121ed2d.pth'
IMAGE_SIZE:
- 288
- 384
NUM_JOINTS: 17
EXTRA:
TARGET_TYPE: 'gaussian'
HEATMAP_SIZE:
- 72
- 96
SIGMA: 3
FINAL_CONV_KERNEL: 1
DECONV_WITH_BIAS: false
NUM_DECONV_LAYERS: 3
NUM_DECONV_FILTERS:
- 256
- 256
- 256
NUM_DECONV_KERNELS:
- 4
- 4
- 4
NUM_LAYERS: 152
LOSS:
USE_TARGET_WEIGHT: true
TRAIN:
BATCH_SIZE: 32
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 140
RESUME: false
OPTIMIZER: 'adam'
LR: 0.001
LR_FACTOR: 0.1
LR_STEP:
- 90
- 120
WD: 0.0001
GAMMA1: 0.99
GAMMA2: 0.0
MOMENTUM: 0.9
NESTEROV: false
TEST:
BATCH_SIZE: 32
COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
BBOX_THRE: 1.0
FLIP_TEST: false
IMAGE_THRE: 0.0
IN_VIS_THRE: 0.2
MODEL_FILE: ''
NMS_THRE: 1.0
OKS_THRE: 0.9
USE_GT_BBOX: true
DEBUG:
DEBUG: true
SAVE_BATCH_IMAGES_GT: true
SAVE_BATCH_IMAGES_PRED: true
SAVE_HEATMAPS_GT: true
SAVE_HEATMAPS_PRED: true

Просмотреть файл

@ -0,0 +1,76 @@
GPUS: '0'
DATA_DIR: ''
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 100
DATASET:
DATASET: 'coco'
ROOT: 'data/coco/'
TEST_SET: 'val2017'
TRAIN_SET: 'train2017'
FLIP: true
ROT_FACTOR: 40
SCALE_FACTOR: 0.3
MODEL:
NAME: 'pose_resnet'
PRETRAINED: 'models/pytorch/imagenet/resnet50-19c8e357.pth'
IMAGE_SIZE:
- 192
- 256
NUM_JOINTS: 17
EXTRA:
TARGET_TYPE: 'gaussian'
HEATMAP_SIZE:
- 48
- 64
SIGMA: 2
FINAL_CONV_KERNEL: 1
DECONV_WITH_BIAS: false
NUM_DECONV_LAYERS: 3
NUM_DECONV_FILTERS:
- 256
- 256
- 256
NUM_DECONV_KERNELS:
- 4
- 4
- 4
NUM_LAYERS: 50
LOSS:
USE_TARGET_WEIGHT: true
TRAIN:
BATCH_SIZE: 32
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 140
RESUME: false
OPTIMIZER: 'adam'
LR: 0.001
LR_FACTOR: 0.1
LR_STEP:
- 90
- 120
WD: 0.0001
GAMMA1: 0.99
GAMMA2: 0.0
MOMENTUM: 0.9
NESTEROV: false
TEST:
BATCH_SIZE: 32
COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
BBOX_THRE: 1.0
FLIP_TEST: false
IMAGE_THRE: 0.0
IN_VIS_THRE: 0.2
MODEL_FILE: ''
NMS_THRE: 1.0
OKS_THRE: 0.9
USE_GT_BBOX: true
DEBUG:
DEBUG: true
SAVE_BATCH_IMAGES_GT: true
SAVE_BATCH_IMAGES_PRED: true
SAVE_HEATMAPS_GT: true
SAVE_HEATMAPS_PRED: true

Просмотреть файл

@ -0,0 +1,76 @@
GPUS: '0'
DATA_DIR: ''
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 100
DATASET:
DATASET: 'coco'
ROOT: 'data/coco/'
TEST_SET: 'val2017'
TRAIN_SET: 'train2017'
FLIP: true
ROT_FACTOR: 40
SCALE_FACTOR: 0.3
MODEL:
NAME: 'pose_resnet'
PRETRAINED: 'models/pytorch/imagenet/resnet50-19c8e357.pth'
IMAGE_SIZE:
- 288
- 384
NUM_JOINTS: 17
EXTRA:
TARGET_TYPE: 'gaussian'
HEATMAP_SIZE:
- 72
- 96
SIGMA: 3
FINAL_CONV_KERNEL: 1
DECONV_WITH_BIAS: false
NUM_DECONV_LAYERS: 3
NUM_DECONV_FILTERS:
- 256
- 256
- 256
NUM_DECONV_KERNELS:
- 4
- 4
- 4
NUM_LAYERS: 50
LOSS:
USE_TARGET_WEIGHT: true
TRAIN:
BATCH_SIZE: 32
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 140
RESUME: false
OPTIMIZER: 'adam'
LR: 0.001
LR_FACTOR: 0.1
LR_STEP:
- 90
- 120
WD: 0.0001
GAMMA1: 0.99
GAMMA2: 0.0
MOMENTUM: 0.9
NESTEROV: false
TEST:
BATCH_SIZE: 32
COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
BBOX_THRE: 1.0
FLIP_TEST: false
IMAGE_THRE: 0.0
IN_VIS_THRE: 0.2
MODEL_FILE: ''
NMS_THRE: 1.0
OKS_THRE: 0.9
USE_GT_BBOX: true
DEBUG:
DEBUG: true
SAVE_BATCH_IMAGES_GT: true
SAVE_BATCH_IMAGES_PRED: true
SAVE_HEATMAPS_GT: true
SAVE_HEATMAPS_PRED: true

Просмотреть файл

@ -0,0 +1,72 @@
GPUS: '0'
DATA_DIR: ''
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 100
CUDNN:
BENCHMARK: True
DETERMINISTIC: False
ENABLED: True
DATASET:
DATASET: mpii
ROOT: 'data/mpii/'
TEST_SET: valid
TRAIN_SET: train
FLIP: true
ROT_FACTOR: 30
SCALE_FACTOR: 0.25
MODEL:
NAME: pose_resnet
PRETRAINED: 'models/pytorch/imagenet/resnet101-5d3b4d8f.pth'
IMAGE_SIZE:
- 256
- 256
NUM_JOINTS: 16
EXTRA:
TARGET_TYPE: gaussian
SIGMA: 2
HEATMAP_SIZE:
- 64
- 64
FINAL_CONV_KERNEL: 1
DECONV_WITH_BIAS: false
NUM_DECONV_LAYERS: 3
NUM_DECONV_FILTERS:
- 256
- 256
- 256
NUM_DECONV_KERNELS:
- 4
- 4
- 4
NUM_LAYERS: 101
LOSS:
USE_TARGET_WEIGHT: true
TRAIN:
BATCH_SIZE: 32
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 140
RESUME: false
OPTIMIZER: adam
LR: 0.001
LR_FACTOR: 0.1
LR_STEP:
- 90
- 120
WD: 0.0001
GAMMA1: 0.99
GAMMA2: 0.0
MOMENTUM: 0.9
NESTEROV: false
TEST:
BATCH_SIZE: 32
FLIP_TEST: false
MODEL_FILE: ''
DEBUG:
DEBUG: false
SAVE_BATCH_IMAGES_GT: true
SAVE_BATCH_IMAGES_PRED: true
SAVE_HEATMAPS_GT: true
SAVE_HEATMAPS_PRED: true

Просмотреть файл

@ -0,0 +1,69 @@
GPUS: '0'
DATA_DIR: ''
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 100
DATASET:
DATASET: mpii
ROOT: 'data/mpii/'
TEST_SET: valid
TRAIN_SET: train
FLIP: true
ROT_FACTOR: 30
SCALE_FACTOR: 0.25
MODEL:
NAME: pose_resnet
PRETRAINED: 'models/pytorch/imagenet/resnet101-5d3b4d8f.pth'
IMAGE_SIZE:
- 384
- 384
NUM_JOINTS: 16
EXTRA:
TARGET_TYPE: gaussian
HEATMAP_SIZE:
- 96
- 96
SIGMA: 3
FINAL_CONV_KERNEL: 1
DECONV_WITH_BIAS: false
NUM_DECONV_LAYERS: 3
NUM_DECONV_FILTERS:
- 256
- 256
- 256
NUM_DECONV_KERNELS:
- 4
- 4
- 4
NUM_LAYERS: 101
LOSS:
USE_TARGET_WEIGHT: true
TRAIN:
BATCH_SIZE: 32
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 140
RESUME: false
OPTIMIZER: adam
LR: 0.001
LR_FACTOR: 0.1
LR_STEP:
- 90
- 120
WD: 0.0001
GAMMA1: 0.99
GAMMA2: 0.0
MOMENTUM: 0.9
NESTEROV: false
TEST:
BATCH_SIZE: 32
FLIP_TEST: false
MODEL_FILE: ''
DEBUG:
DEBUG: false
SAVE_BATCH_IMAGES_GT: true
SAVE_BATCH_IMAGES_PRED: true
SAVE_HEATMAPS_GT: true
SAVE_HEATMAPS_PRED: true

Просмотреть файл

@ -0,0 +1,72 @@
GPUS: '0'
DATA_DIR: ''
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 100
CUDNN:
BENCHMARK: True
DETERMINISTIC: False
ENABLED: True
DATASET:
DATASET: mpii
ROOT: 'data/mpii/'
TEST_SET: valid
TRAIN_SET: train
FLIP: true
ROT_FACTOR: 30
SCALE_FACTOR: 0.25
MODEL:
NAME: pose_resnet
PRETRAINED: 'models/pytorch/imagenet/resnet152-b121ed2d.pth'
IMAGE_SIZE:
- 256
- 256
NUM_JOINTS: 16
EXTRA:
TARGET_TYPE: gaussian
SIGMA: 2
HEATMAP_SIZE:
- 64
- 64
FINAL_CONV_KERNEL: 1
DECONV_WITH_BIAS: false
NUM_DECONV_LAYERS: 3
NUM_DECONV_FILTERS:
- 256
- 256
- 256
NUM_DECONV_KERNELS:
- 4
- 4
- 4
NUM_LAYERS: 152
LOSS:
USE_TARGET_WEIGHT: true
TRAIN:
BATCH_SIZE: 32
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 140
RESUME: false
OPTIMIZER: adam
LR: 0.001
LR_FACTOR: 0.1
LR_STEP:
- 90
- 120
WD: 0.0001
GAMMA1: 0.99
GAMMA2: 0.0
MOMENTUM: 0.9
NESTEROV: false
TEST:
BATCH_SIZE: 32
FLIP_TEST: false
MODEL_FILE: ''
DEBUG:
DEBUG: false
SAVE_BATCH_IMAGES_GT: true
SAVE_BATCH_IMAGES_PRED: true
SAVE_HEATMAPS_GT: true
SAVE_HEATMAPS_PRED: true

Просмотреть файл

@ -0,0 +1,72 @@
GPUS: '0'
DATA_DIR: ''
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 100
CUDNN:
BENCHMARK: True
DETERMINISTIC: False
ENABLED: True
DATASET:
DATASET: mpii
ROOT: 'data/mpii/'
TEST_SET: valid
TRAIN_SET: train
FLIP: true
ROT_FACTOR: 30
SCALE_FACTOR: 0.25
MODEL:
NAME: pose_resnet
PRETRAINED: 'models/pytorch/imagenet/resnet152-b121ed2d.pth'
IMAGE_SIZE:
- 256
- 256
NUM_JOINTS: 16
EXTRA:
TARGET_TYPE: gaussian
SIGMA: 2
HEATMAP_SIZE:
- 64
- 64
FINAL_CONV_KERNEL: 1
DECONV_WITH_BIAS: false
NUM_DECONV_LAYERS: 3
NUM_DECONV_FILTERS:
- 256
- 256
- 256
NUM_DECONV_KERNELS:
- 4
- 4
- 4
NUM_LAYERS: 152
LOSS:
USE_TARGET_WEIGHT: true
TRAIN:
BATCH_SIZE: 24
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 140
RESUME: false
OPTIMIZER: adam
LR: 0.001
LR_FACTOR: 0.1
LR_STEP:
- 90
- 120
WD: 0.0001
GAMMA1: 0.99
GAMMA2: 0.0
MOMENTUM: 0.9
NESTEROV: false
TEST:
BATCH_SIZE: 32
FLIP_TEST: false
MODEL_FILE: ''
DEBUG:
DEBUG: false
SAVE_BATCH_IMAGES_GT: true
SAVE_BATCH_IMAGES_PRED: true
SAVE_HEATMAPS_GT: true
SAVE_HEATMAPS_PRED: true

Просмотреть файл

@ -0,0 +1,72 @@
GPUS: '0'
DATA_DIR: ''
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 100
CUDNN:
BENCHMARK: True
DETERMINISTIC: False
ENABLED: True
DATASET:
DATASET: mpii
ROOT: 'data/mpii/'
TEST_SET: valid
TRAIN_SET: train
FLIP: true
ROT_FACTOR: 30
SCALE_FACTOR: 0.25
MODEL:
NAME: pose_resnet
PRETRAINED: 'models/pytorch/imagenet/resnet50-19c8e357.pth'
IMAGE_SIZE:
- 256
- 256
NUM_JOINTS: 16
EXTRA:
TARGET_TYPE: gaussian
SIGMA: 2
HEATMAP_SIZE:
- 64
- 64
FINAL_CONV_KERNEL: 1
DECONV_WITH_BIAS: false
NUM_DECONV_LAYERS: 3
NUM_DECONV_FILTERS:
- 256
- 256
- 256
NUM_DECONV_KERNELS:
- 4
- 4
- 4
NUM_LAYERS: 50
LOSS:
USE_TARGET_WEIGHT: true
TRAIN:
BATCH_SIZE: 32
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 140
RESUME: false
OPTIMIZER: adam
LR: 0.001
LR_FACTOR: 0.1
LR_STEP:
- 90
- 120
WD: 0.0001
GAMMA1: 0.99
GAMMA2: 0.0
MOMENTUM: 0.9
NESTEROV: false
TEST:
BATCH_SIZE: 32
FLIP_TEST: false
MODEL_FILE: ''
DEBUG:
DEBUG: false
SAVE_BATCH_IMAGES_GT: true
SAVE_BATCH_IMAGES_PRED: true
SAVE_HEATMAPS_GT: true
SAVE_HEATMAPS_PRED: true

Просмотреть файл

@ -0,0 +1,72 @@
GPUS: '0'
DATA_DIR: ''
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 100
CUDNN:
BENCHMARK: True
DETERMINISTIC: False
ENABLED: True
DATASET:
DATASET: mpii
ROOT: 'data/mpii/'
TEST_SET: valid
TRAIN_SET: train
FLIP: true
ROT_FACTOR: 30
SCALE_FACTOR: 0.25
MODEL:
NAME: pose_resnet
PRETRAINED: 'models/pytorch/imagenet/resnet50-19c8e357.pth'
IMAGE_SIZE:
- 256
- 256
NUM_JOINTS: 16
EXTRA:
TARGET_TYPE: gaussian
SIGMA: 2
HEATMAP_SIZE:
- 64
- 64
FINAL_CONV_KERNEL: 1
DECONV_WITH_BIAS: false
NUM_DECONV_LAYERS: 3
NUM_DECONV_FILTERS:
- 256
- 256
- 256
NUM_DECONV_KERNELS:
- 4
- 4
- 4
NUM_LAYERS: 50
LOSS:
USE_TARGET_WEIGHT: true
TRAIN:
BATCH_SIZE: 32
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 140
RESUME: false
OPTIMIZER: adam
LR: 0.001
LR_FACTOR: 0.1
LR_STEP:
- 90
- 120
WD: 0.0001
GAMMA1: 0.99
GAMMA2: 0.0
MOMENTUM: 0.9
NESTEROV: false
TEST:
BATCH_SIZE: 32
FLIP_TEST: false
MODEL_FILE: ''
DEBUG:
DEBUG: false
SAVE_BATCH_IMAGES_GT: true
SAVE_BATCH_IMAGES_PRED: true
SAVE_HEATMAPS_GT: true
SAVE_HEATMAPS_PRED: true

4
lib/Makefile Executable file
Просмотреть файл

@ -0,0 +1,4 @@
all:
cd nms; python setup.py build_ext --inplace; rm -rf build; cd ../../
clean:
cd nms; rm *.so; cd ../../

225
lib/core/config.py Executable file
Просмотреть файл

@ -0,0 +1,225 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import yaml
import numpy as np
from easydict import EasyDict as edict
config = edict()
config.OUTPUT_DIR = ''
config.LOG_DIR = ''
config.DATA_DIR = ''
config.GPUS = '0'
config.WORKERS = 4
config.PRINT_FREQ = 20
# Cudnn related params
config.CUDNN = edict()
config.CUDNN.BENCHMARK = True
config.CUDNN.DETERMINISTIC = False
config.CUDNN.ENABLED = True
# pose_resnet related params
POSE_RESNET = edict()
POSE_RESNET.NUM_LAYERS = 50
POSE_RESNET.DECONV_WITH_BIAS = False
POSE_RESNET.NUM_DECONV_LAYERS = 3
POSE_RESNET.NUM_DECONV_FILTERS = [256, 256, 256]
POSE_RESNET.NUM_DECONV_KERNELS = [4, 4, 4]
POSE_RESNET.FINAL_CONV_KERNEL = 1
POSE_RESNET.TARGET_TYPE = 'gaussian'
POSE_RESNET.HEATMAP_SIZE = [64, 64] # width * height, ex: 24 * 32
POSE_RESNET.SIGMA = 2
MODEL_EXTRAS = {
'pose_resnet': POSE_RESNET,
}
# common params for NETWORK
config.MODEL = edict()
config.MODEL.NAME = 'pose_resnet'
config.MODEL.INIT_WEIGHTS = True
config.MODEL.PRETRAINED = ''
config.MODEL.NUM_JOINTS = 16
config.MODEL.IMAGE_SIZE = [256, 256] # width * height, ex: 192 * 256
config.MODEL.EXTRA = MODEL_EXTRAS[config.MODEL.NAME]
config.LOSS = edict()
config.LOSS.USE_TARGET_WEIGHT = True
# DATASET related params
config.DATASET = edict()
config.DATASET.ROOT = ''
config.DATASET.DATASET = 'mpii'
config.DATASET.TRAIN_SET = 'train'
config.DATASET.TEST_SET = 'valid'
config.DATASET.DATA_FORMAT = 'jpg'
config.DATASET.HYBRID_JOINTS_TYPE = ''
config.DATASET.SELECT_DATA = False
# training data augmentation
config.DATASET.FLIP = True
config.DATASET.SCALE_FACTOR = 0.25
config.DATASET.ROT_FACTOR = 30
# train
config.TRAIN = edict()
config.TRAIN.LR_FACTOR = 0.1
config.TRAIN.LR_STEP = [90, 110]
config.TRAIN.LR = 0.001
config.TRAIN.OPTIMIZER = 'adam'
config.TRAIN.MOMENTUM = 0.9
config.TRAIN.WD = 0.0001
config.TRAIN.NESTEROV = False
config.TRAIN.GAMMA1 = 0.99
config.TRAIN.GAMMA2 = 0.0
config.TRAIN.BEGIN_EPOCH = 0
config.TRAIN.END_EPOCH = 140
config.TRAIN.RESUME = False
config.TRAIN.CHECKPOINT = ''
config.TRAIN.BATCH_SIZE = 32
config.TRAIN.SHUFFLE = True
# testing
config.TEST = edict()
# size of images for each device
config.TEST.BATCH_SIZE = 32
# Test Model Epoch
config.TEST.FLIP_TEST = False
config.TEST.POST_PROCESS = True
config.TEST.SHIFT_HEATMAP = True
config.TEST.USE_GT_BBOX = False
# nms
config.TEST.OKS_THRE = 0.5
config.TEST.IN_VIS_THRE = 0.0
config.TEST.COCO_BBOX_FILE = ''
config.TEST.BBOX_THRE = 1.0
config.TEST.MODEL_FILE = ''
# debug
config.DEBUG = edict()
config.DEBUG.DEBUG = False
config.DEBUG.SAVE_BATCH_IMAGES_GT = False
config.DEBUG.SAVE_BATCH_IMAGES_PRED = False
config.DEBUG.SAVE_HEATMAPS_GT = False
config.DEBUG.SAVE_HEATMAPS_PRED = False
def _update_dict(k, v):
if k == 'DATASET':
if 'MEAN' in v and v['MEAN']:
v['MEAN'] = np.array([eval(x) if isinstance(x, str) else x
for x in v['MEAN']])
if 'STD' in v and v['STD']:
v['STD'] = np.array([eval(x) if isinstance(x, str) else x
for x in v['STD']])
if k == 'MODEL':
if 'EXTRA' in v and 'HEATMAP_SIZE' in v['EXTRA']:
if isinstance(v['EXTRA']['HEATMAP_SIZE'], int):
v['EXTRA']['HEATMAP_SIZE'] = np.array(
[v['EXTRA']['HEATMAP_SIZE'], v['EXTRA']['HEATMAP_SIZE']])
else:
v['EXTRA']['HEATMAP_SIZE'] = np.array(
v['EXTRA']['HEATMAP_SIZE'])
if 'IMAGE_SIZE' in v:
if isinstance(v['IMAGE_SIZE'], int):
v['IMAGE_SIZE'] = np.array([v['IMAGE_SIZE'], v['IMAGE_SIZE']])
else:
v['IMAGE_SIZE'] = np.array(v['IMAGE_SIZE'])
for vk, vv in v.items():
if vk in config[k]:
config[k][vk] = vv
else:
raise ValueError("{}.{} not exist in config.py".format(k, vk))
def update_config(config_file):
exp_config = None
with open(config_file) as f:
exp_config = edict(yaml.load(f))
for k, v in exp_config.items():
if k in config:
if isinstance(v, dict):
_update_dict(k, v)
else:
if k == 'SCALES':
config[k][0] = (tuple(v))
else:
config[k] = v
else:
raise ValueError("{} not exist in config.py".format(k))
def gen_config(config_file):
cfg = dict(config)
for k, v in cfg.items():
if isinstance(v, edict):
cfg[k] = dict(v)
with open(config_file, 'w') as f:
yaml.dump(dict(cfg), f, default_flow_style=False)
def update_dir(model_dir, log_dir, data_dir):
if model_dir:
config.OUTPUT_DIR = model_dir
if log_dir:
config.LOG_DIR = log_dir
if data_dir:
config.DATA_DIR = data_dir
config.DATASET.ROOT = os.path.join(
config.DATA_DIR, config.DATASET.ROOT)
config.TEST.COCO_BBOX_FILE = os.path.join(
config.DATA_DIR, config.TEST.COCO_BBOX_FILE)
config.MODEL.PRETRAINED = os.path.join(
config.DATA_DIR, config.MODEL.PRETRAINED)
def get_model_name(cfg):
name = cfg.MODEL.NAME
full_name = cfg.MODEL.NAME
extra = cfg.MODEL.EXTRA
if name in ['pose_resnet']:
name = '{model}_{num_layers}'.format(
model=name,
num_layers=extra.NUM_LAYERS)
deconv_suffix = ''.join(
'd{}'.format(num_filters)
for num_filters in extra.NUM_DECONV_FILTERS)
full_name = '{height}x{width}_{name}_{deconv_suffix}'.format(
height=cfg.MODEL.IMAGE_SIZE[1],
width=cfg.MODEL.IMAGE_SIZE[0],
name=name,
deconv_suffix=deconv_suffix)
else:
raise ValueError('Unkown model: {}'.format(cfg.MODEL))
return name, full_name
if __name__ == '__main__':
import sys
gen_config(sys.argv[1])

71
lib/core/evaluate.py Normal file
Просмотреть файл

@ -0,0 +1,71 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from core.inference import get_max_preds
def calc_dists(preds, target, normalize):
preds = preds.astype(np.float32)
target = target.astype(np.float32)
dists = np.zeros((preds.shape[1], preds.shape[0]))
for n in range(preds.shape[0]):
for c in range(preds.shape[1]):
if target[n, c, 0] > 1 and target[n, c, 1] > 1:
normed_preds = preds[n, c, :] / normalize[n]
normed_targets = target[n, c, :] / normalize[n]
dists[c, n] = np.linalg.norm(normed_preds - normed_targets)
else:
dists[c, n] = -1
return dists
def dist_acc(dists, thr=0.5):
''' Return percentage below threshold while ignoring values with a -1 '''
dist_cal = np.not_equal(dists, -1)
num_dist_cal = dist_cal.sum()
if num_dist_cal > 0:
return np.less(dists[dist_cal], thr).sum() * 1.0 / num_dist_cal
else:
return -1
def accuracy(output, target, hm_type='gaussian', thr=0.5):
'''
Calculate accuracy according to PCK,
but uses ground truth heatmap rather than x,y locations
First value to be returned is average accuracy across 'idxs',
followed by individual accuracies
'''
idx = list(range(output.shape[1]))
norm = 1.0
if hm_type == 'gaussian':
pred, _ = get_max_preds(output)
target, _ = get_max_preds(target)
h = output.shape[2]
w = output.shape[3]
norm = np.ones((pred.shape[0], 2)) * np.array([h, w]) / 10
dists = calc_dists(pred, target, norm)
acc = np.zeros((len(idx) + 1))
avg_acc = 0
cnt = 0
for i in range(len(idx)):
acc[i + 1] = dist_acc(dists[idx[i]])
if acc[i + 1] >= 0:
avg_acc = avg_acc + acc[i + 1]
cnt += 1
avg_acc = avg_acc / cnt if cnt != 0 else 0
if cnt != 0:
acc[0] = avg_acc
return acc, avg_acc, cnt, pred

239
lib/core/function.py Executable file
Просмотреть файл

@ -0,0 +1,239 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import logging
import time
import os
import numpy as np
import torch
from core.config import get_model_name
from core.evaluate import accuracy
from core.inference import get_final_preds
from utils.transforms import flip_back
from utils.vis import save_debug_images
logger = logging.getLogger(__name__)
def train(config, train_loader, model, criterion, optimizer, epoch,
output_dir, tb_log_dir, writer_dict):
batch_time = AverageMeter()
data_time = AverageMeter()
losses = AverageMeter()
acc = AverageMeter()
# switch to train mode
model.train()
end = time.time()
for i, (input, target, target_weight, meta) in enumerate(train_loader):
# measure data loading time
data_time.update(time.time() - end)
# compute output
output = model(input)
target = target.cuda(non_blocking=True)
target_weight = target_weight.cuda(non_blocking=True)
loss = criterion(output, target, target_weight)
# compute gradient and do update step
optimizer.zero_grad()
loss.backward()
optimizer.step()
# measure accuracy and record loss
losses.update(loss.item(), input.size(0))
_, avg_acc, cnt, pred = accuracy(output.detach().cpu().numpy(),
target.detach().cpu().numpy())
acc.update(avg_acc, cnt)
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
if i % config.PRINT_FREQ == 0:
msg = 'Epoch: [{0}][{1}/{2}]\t' \
'Time {batch_time.val:.3f}s ({batch_time.avg:.3f}s)\t' \
'Speed {speed:.1f} samples/s\t' \
'Data {data_time.val:.3f}s ({data_time.avg:.3f}s)\t' \
'Loss {loss.val:.5f} ({loss.avg:.5f})\t' \
'Accuracy {acc.val:.3f} ({acc.avg:.3f})'.format(
epoch, i, len(train_loader), batch_time=batch_time,
speed=input.size(0)/batch_time.val,
data_time=data_time, loss=losses, acc=acc)
logger.info(msg)
writer = writer_dict['writer']
global_steps = writer_dict['train_global_steps']
writer.add_scalar('train_loss', losses.val, global_steps)
writer.add_scalar('train_acc', acc.val, global_steps)
writer_dict['train_global_steps'] = global_steps + 1
prefix = '{}_{}'.format(os.path.join(output_dir, 'train'), i)
save_debug_images(config, input, meta, target, pred*4, output,
prefix)
def validate(config, val_loader, val_dataset, model, criterion, output_dir,
tb_log_dir, writer_dict=None):
batch_time = AverageMeter()
losses = AverageMeter()
acc = AverageMeter()
# switch to evaluate mode
model.eval()
num_samples = len(val_dataset)
all_preds = np.zeros((num_samples, config.MODEL.NUM_JOINTS, 3),
dtype=np.float32)
all_boxes = np.zeros((num_samples, 6))
image_path = []
filenames = []
imgnums = []
idx = 0
with torch.no_grad():
end = time.time()
for i, (input, target, target_weight, meta) in enumerate(val_loader):
# compute output
output = model(input)
if config.TEST.FLIP_TEST:
# this part is ugly, because pytorch has not supported negative index
# input_flipped = model(input[:, :, :, ::-1])
input_flipped = np.flip(input.cpu().numpy(), 3).copy()
input_flipped = torch.from_numpy(input_flipped).cuda()
output_flipped = model(input_flipped)
output_flipped = flip_back(output_flipped.cpu().numpy(),
val_dataset.flip_pairs)
output_flipped = torch.from_numpy(output_flipped.copy()).cuda()
# feature is not aligned, shift flipped heatmap for higher accuracy
if config.TEST.SHIFT_HEATMAP:
output_flipped[:, :, :, 1:] = \
output_flipped.clone()[:, :, :, 0:-1]
# output_flipped[:, :, :, 0] = 0
output = (output + output_flipped) * 0.5
target = target.cuda(non_blocking=True)
target_weight = target_weight.cuda(non_blocking=True)
loss = criterion(output, target, target_weight)
num_images = input.size(0)
# measure accuracy and record loss
losses.update(loss.item(), num_images)
_, avg_acc, cnt, pred = accuracy(output.cpu().numpy(),
target.cpu().numpy())
acc.update(avg_acc, cnt)
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
c = meta['center'].numpy()
s = meta['scale'].numpy()
score = meta['score'].numpy()
preds, maxvals = get_final_preds(
config, output.clone().cpu().numpy(), c, s)
all_preds[idx:idx + num_images, :, 0:2] = preds[:, :, 0:2]
all_preds[idx:idx + num_images, :, 2:3] = maxvals
# double check this all_boxes parts
all_boxes[idx:idx + num_images, 0:2] = c[:, 0:2]
all_boxes[idx:idx + num_images, 2:4] = s[:, 0:2]
all_boxes[idx:idx + num_images, 4] = np.prod(s*200, 1)
all_boxes[idx:idx + num_images, 5] = score
image_path.extend(meta['image'])
if config.DATASET.DATASET == 'posetrack':
filenames.extend(meta['filename'])
imgnums.extend(meta['imgnum'].numpy())
idx += num_images
if i % config.PRINT_FREQ == 0:
msg = 'Test: [{0}/{1}]\t' \
'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \
'Loss {loss.val:.4f} ({loss.avg:.4f})\t' \
'Accuracy {acc.val:.3f} ({acc.avg:.3f})'.format(
i, len(val_loader), batch_time=batch_time,
loss=losses, acc=acc)
logger.info(msg)
prefix = '{}_{}'.format(os.path.join(output_dir, 'val'), i)
save_debug_images(config, input, meta, target, pred*4, output,
prefix)
name_values, perf_indicator = val_dataset.evaluate(
config, all_preds, output_dir, all_boxes, image_path,
filenames, imgnums)
_, full_arch_name = get_model_name(config)
if isinstance(name_values, list):
for name_value in name_values:
_print_name_value(name_value, full_arch_name)
else:
_print_name_value(name_values, full_arch_name)
if writer_dict:
writer = writer_dict['writer']
global_steps = writer_dict['valid_global_steps']
writer.add_scalar('valid_loss', losses.avg, global_steps)
writer.add_scalar('valid_acc', acc.avg, global_steps)
if isinstance(name_values, list):
for name_value in name_values:
writer.add_scalars('valid', dict(name_value), global_steps)
else:
writer.add_scalars('valid', dict(name_values), global_steps)
writer_dict['valid_global_steps'] = global_steps + 1
return perf_indicator
# markdown format output
def _print_name_value(name_value, full_arch_name):
names = name_value.keys()
values = name_value.values()
num_values = len(name_value)
logger.info(
'| Arch ' +
' '.join(['| {}'.format(name) for name in names]) +
' |'
)
logger.info('|---' * (num_values+1) + '|')
logger.info(
'| ' + full_arch_name + ' ' +
' '.join(['| {:.3f}'.format(value) for value in values]) +
' |'
)
class AverageMeter(object):
"""Computes and stores the average and current value"""
def __init__(self):
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count if self.count != 0 else 0

74
lib/core/inference.py Normal file
Просмотреть файл

@ -0,0 +1,74 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import numpy as np
from utils.transforms import transform_preds
def get_max_preds(batch_heatmaps):
'''
get predictions from score maps
heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
'''
assert isinstance(batch_heatmaps, np.ndarray), \
'batch_heatmaps should be numpy.ndarray'
assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim'
batch_size = batch_heatmaps.shape[0]
num_joints = batch_heatmaps.shape[1]
width = batch_heatmaps.shape[3]
heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1))
idx = np.argmax(heatmaps_reshaped, 2)
maxvals = np.amax(heatmaps_reshaped, 2)
maxvals = maxvals.reshape((batch_size, num_joints, 1))
idx = idx.reshape((batch_size, num_joints, 1))
preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
preds[:, :, 0] = (preds[:, :, 0]) % width
preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
pred_mask = pred_mask.astype(np.float32)
preds *= pred_mask
return preds, maxvals
def get_final_preds(config, batch_heatmaps, center, scale):
coords, maxvals = get_max_preds(batch_heatmaps)
heatmap_height = batch_heatmaps.shape[2]
heatmap_width = batch_heatmaps.shape[3]
# post-processing
if config.TEST.POST_PROCESS:
for n in range(coords.shape[0]):
for p in range(coords.shape[1]):
hm = batch_heatmaps[n][p]
px = int(math.floor(coords[n][p][0] + 0.5))
py = int(math.floor(coords[n][p][1] + 0.5))
if 1 < px < heatmap_width-1 and 1 < py < heatmap_height-1:
diff = np.array([hm[py][px+1] - hm[py][px-1],
hm[py+1][px]-hm[py-1][px]])
coords[n][p] += np.sign(diff) * .25
preds = coords.copy()
# Transform back
for i in range(coords.shape[0]):
preds[i] = transform_preds(coords[i], center[i], scale[i],
[heatmap_width, heatmap_height])
return preds, maxvals

38
lib/core/loss.py Normal file
Просмотреть файл

@ -0,0 +1,38 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import torch.nn as nn
class JointsMSELoss(nn.Module):
def __init__(self, use_target_weight):
super(JointsMSELoss, self).__init__()
self.criterion = nn.MSELoss(size_average=True)
self.use_target_weight = use_target_weight
def forward(self, output, target, target_weight):
batch_size = output.size(0)
num_joints = output.size(1)
heatmaps_pred = output.reshape((batch_size, num_joints, -1)).split(1, 1)
heatmaps_gt = target.reshape((batch_size, num_joints, -1)).split(1, 1)
loss = 0
for idx in range(num_joints):
heatmap_pred = heatmaps_pred[idx].squeeze()
heatmap_gt = heatmaps_gt[idx].squeeze()
if self.use_target_weight:
loss += 0.5 * self.criterion(
heatmap_pred.mul(target_weight[:, idx]),
heatmap_gt.mul(target_weight[:, idx])
)
else:
loss += 0.5 * self.criterion(heatmap_pred, heatmap_gt)
return loss / num_joints

218
lib/dataset/JointsDataset.py Executable file
Просмотреть файл

@ -0,0 +1,218 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import copy
import logging
import random
import cv2
import numpy as np
import torch
from torch.utils.data import Dataset
from utils.transforms import get_affine_transform
from utils.transforms import affine_transform
from utils.transforms import fliplr_joints
logger = logging.getLogger(__name__)
class JointsDataset(Dataset):
def __init__(self, cfg, root, image_set, is_train, transform=None):
self.num_joints = 0
self.pixel_std = 200
self.flip_pairs = []
self.parent_ids = []
self.is_train = is_train
self.root = root
self.image_set = image_set
self.output_path = cfg.OUTPUT_DIR
self.data_format = cfg.DATASET.DATA_FORMAT
self.scale_factor = cfg.DATASET.SCALE_FACTOR
self.rotation_factor = cfg.DATASET.ROT_FACTOR
self.flip = cfg.DATASET.FLIP
self.image_size = cfg.MODEL.IMAGE_SIZE
self.target_type = cfg.MODEL.EXTRA.TARGET_TYPE
self.heatmap_size = cfg.MODEL.EXTRA.HEATMAP_SIZE
self.sigma = cfg.MODEL.EXTRA.SIGMA
self.transform = transform
self.db = []
def _get_db(self):
raise NotImplementedError
def evaluate(self, cfg, preds, output_dir, *args, **kwargs):
raise NotImplementedError
def __len__(self,):
return len(self.db)
def __getitem__(self, idx):
db_rec = copy.deepcopy(self.db[idx])
image_file = db_rec['image']
filename = db_rec['filename'] if 'filename' in db_rec else ''
imgnum = db_rec['imgnum'] if 'imgnum' in db_rec else ''
if self.data_format == 'zip':
from utils import zipreader
data_numpy = zipreader.imread(
image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
else:
data_numpy = cv2.imread(
image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
joints = db_rec['joints_3d']
joints_vis = db_rec['joints_3d_vis']
c = db_rec['center']
s = db_rec['scale']
score = db_rec['score'] if 'score' in db_rec else 1
r = 0
if self.is_train:
sf = self.scale_factor
rf = self.rotation_factor
s = s * np.clip(np.random.randn()*sf + 1, 1 - sf, 1 + sf)
r = np.clip(np.random.randn()*rf, -rf*2, rf*2) \
if random.random() <= 0.6 else 0
if self.flip and random.random() <= 0.5:
data_numpy = data_numpy[:, ::-1, :]
joints, joints_vis = fliplr_joints(
joints, joints_vis, data_numpy.shape[1], self.flip_pairs)
c[0] = data_numpy.shape[1] - c[0] - 1
trans = get_affine_transform(c, s, r, self.image_size)
input = cv2.warpAffine(
data_numpy,
trans,
(int(self.image_size[0]), int(self.image_size[1])),
flags=cv2.INTER_LINEAR)
if self.transform:
input = self.transform(input)
for i in range(self.num_joints):
if joints_vis[i, 0] > 0.0:
joints[i, 0:2] = affine_transform(joints[i, 0:2], trans)
target, target_weight = self.generate_target(joints, joints_vis)
target = torch.from_numpy(target)
target_weight = torch.from_numpy(target_weight)
meta = {
'image': image_file,
'filename': filename,
'imgnum': imgnum,
'joints': joints,
'joints_vis': joints_vis,
'center': c,
'scale': s,
'rotation': r,
'score': score
}
return input, target, target_weight, meta
def select_data(self, db):
db_selected = []
for rec in db:
num_vis = 0
joints_x = 0.0
joints_y = 0.0
for joint, joint_vis in zip(
rec['joints_3d'], rec['joints_3d_vis']):
if joint_vis[0] <= 0:
continue
num_vis += 1
joints_x += joint[0]
joints_y += joint[1]
if num_vis == 0:
continue
joints_x, joints_y = joints_x / num_vis, joints_y / num_vis
area = rec['scale'][0] * rec['scale'][1] * (self.pixel_std**2)
joints_center = np.array([joints_x, joints_y])
bbox_center = np.array(rec['center'])
diff_norm2 = np.linalg.norm((joints_center-bbox_center), 2)
ks = np.exp(-1.0*(diff_norm2**2) / ((0.2)**2*2.0*area))
metric = (0.2 / 16) * num_vis + 0.45 - 0.2 / 16
if ks > metric:
db_selected.append(rec)
logger.info('=> num db: {}'.format(len(db)))
logger.info('=> num selected db: {}'.format(len(db_selected)))
return db_selected
def generate_target(self, joints, joints_vis):
'''
:param joints: [num_joints, 3]
:param joints_vis: [num_joints, 3]
:return: target, target_weight(1: visible, 0: invisible)
'''
target_weight = np.ones((self.num_joints, 1), dtype=np.float32)
target_weight[:, 0] = joints_vis[:, 0]
assert self.target_type == 'gaussian', \
'Only support gaussian map now!'
if self.target_type == 'gaussian':
target = np.zeros((self.num_joints,
self.heatmap_size[1],
self.heatmap_size[0]),
dtype=np.float32)
tmp_size = self.sigma * 3
for joint_id in range(self.num_joints):
feat_stride = self.image_size / self.heatmap_size
mu_x = int(joints[joint_id][0] / feat_stride[0] + 0.5)
mu_y = int(joints[joint_id][1] / feat_stride[1] + 0.5)
# Check that any part of the gaussian is in-bounds
ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)]
br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)]
if ul[0] >= self.heatmap_size[0] or ul[1] >= self.heatmap_size[1] \
or br[0] < 0 or br[1] < 0:
# If not, just return the image as is
target_weight[joint_id] = 0
continue
# # Generate gaussian
size = 2 * tmp_size + 1
x = np.arange(0, size, 1, np.float32)
y = x[:, np.newaxis]
x0 = y0 = size // 2
# The gaussian is not normalized, we want the center value to equal 1
g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * self.sigma ** 2))
# Usable gaussian range
g_x = max(0, -ul[0]), min(br[0], self.heatmap_size[0]) - ul[0]
g_y = max(0, -ul[1]), min(br[1], self.heatmap_size[1]) - ul[1]
# Image range
img_x = max(0, ul[0]), min(br[0], self.heatmap_size[0])
img_y = max(0, ul[1]), min(br[1], self.heatmap_size[1])
v = target_weight[joint_id]
if v > 0.5:
target[joint_id][img_y[0]:img_y[1], img_x[0]:img_x[1]] = \
g[g_y[0]:g_y[1], g_x[0]:g_x[1]]
return target, target_weight

12
lib/dataset/__init__.py Normal file
Просмотреть файл

@ -0,0 +1,12 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from .mpii import MPIIDataset as mpii
from .coco import COCODataset as coco

404
lib/dataset/coco.py Executable file
Просмотреть файл

@ -0,0 +1,404 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import logging
import os
import pickle
from collections import defaultdict
from collections import OrderedDict
import json_tricks as json
import numpy as np
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
from dataset.JointsDataset import JointsDataset
from nms.nms import oks_nms
logger = logging.getLogger(__name__)
class COCODataset(JointsDataset):
'''
"keypoints": {
0: "nose",
1: "left_eye",
2: "right_eye",
3: "left_ear",
4: "right_ear",
5: "left_shoulder",
6: "right_shoulder",
7: "left_elbow",
8: "right_elbow",
9: "left_wrist",
10: "right_wrist",
11: "left_hip",
12: "right_hip",
13: "left_knee",
14: "right_knee",
15: "left_ankle",
16: "right_ankle"
},
"skeleton": [
[16,14],[14,12],[17,15],[15,13],[12,13],[6,12],[7,13], [6,7],[6,8],
[7,9],[8,10],[9,11],[2,3],[1,2],[1,3],[2,4],[3,5],[4,6],[5,7]]
'''
def __init__(self, cfg, root, image_set, is_train, transform=None):
super().__init__(cfg, root, image_set, is_train, transform)
self.nms_thre = cfg.TEST.NMS_THRE
self.image_thre = cfg.TEST.IMAGE_THRE
self.oks_thre = cfg.TEST.OKS_THRE
self.in_vis_thre = cfg.TEST.IN_VIS_THRE
self.bbox_file = cfg.TEST.COCO_BBOX_FILE
self.use_gt_bbox = cfg.TEST.USE_GT_BBOX
self.image_width = cfg.MODEL.IMAGE_SIZE[0]
self.image_height = cfg.MODEL.IMAGE_SIZE[1]
self.aspect_ratio = self.image_width * 1.0 / self.image_height
self.pixel_std = 200
self.coco = COCO(self._get_ann_file_keypoint())
# deal with class names
cats = [cat['name']
for cat in self.coco.loadCats(self.coco.getCatIds())]
self.classes = ['__background__'] + cats
logger.info('=> classes: {}'.format(self.classes))
self.num_classes = len(self.classes)
self._class_to_ind = dict(zip(self.classes, range(self.num_classes)))
self._class_to_coco_ind = dict(zip(cats, self.coco.getCatIds()))
self._coco_ind_to_class_ind = dict([(self._class_to_coco_ind[cls],
self._class_to_ind[cls])
for cls in self.classes[1:]])
# load image file names
self.image_set_index = self._load_image_set_index()
self.num_images = len(self.image_set_index)
logger.info('=> num_images: {}'.format(self.num_images))
self.num_joints = 17
self.flip_pairs = [[1, 2], [3, 4], [5, 6], [7, 8],
[9, 10], [11, 12], [13, 14], [15, 16]]
self.parent_ids = None
self.db = self._get_db()
if is_train and cfg.DATASET.SELECT_DATA:
self.db = self.select_data(self.db)
logger.info('=> load {} samples'.format(len(self.db)))
def _get_ann_file_keypoint(self):
""" self.root / annotations / person_keypoints_train2017.json """
prefix = 'person_keypoints' \
if 'test' not in self.image_set else 'image_info'
return os.path.join(self.root, 'annotations',
prefix + '_' + self.image_set + '.json')
def _load_image_set_index(self):
""" image id: int """
image_ids = self.coco.getImgIds()
return image_ids
def _get_db(self):
if self.is_train or self.use_gt_bbox:
# use ground truth bbox
gt_db = self._load_coco_keypoint_annotations()
else:
# use bbox from detection
gt_db = self._load_coco_person_detection_results()
return gt_db
def _load_coco_keypoint_annotations(self):
""" ground truth bbox and keypoints """
gt_db = []
for index in self.image_set_index:
gt_db.extend(self._load_coco_keypoint_annotation_kernal(index))
return gt_db
def _load_coco_keypoint_annotation_kernal(self, index):
"""
coco ann: [u'segmentation', u'area', u'iscrowd', u'image_id', u'bbox', u'category_id', u'id']
iscrowd:
crowd instances are handled by marking their overlaps with all categories to -1
and later excluded in training
bbox:
[x1, y1, w, h]
:param index: coco image id
:return: db entry
"""
im_ann = self.coco.loadImgs(index)[0]
width = im_ann['width']
height = im_ann['height']
annIds = self.coco.getAnnIds(imgIds=index, iscrowd=False)
objs = self.coco.loadAnns(annIds)
# sanitize bboxes
valid_objs = []
for obj in objs:
x, y, w, h = obj['bbox']
x1 = np.max((0, x))
y1 = np.max((0, y))
x2 = np.min((width - 1, x1 + np.max((0, w - 1))))
y2 = np.min((height - 1, y1 + np.max((0, h - 1))))
if obj['area'] > 0 and x2 >= x1 and y2 >= y1:
# obj['clean_bbox'] = [x1, y1, x2, y2]
obj['clean_bbox'] = [x1, y1, x2-x1, y2-y1]
valid_objs.append(obj)
objs = valid_objs
rec = []
for obj in objs:
cls = self._coco_ind_to_class_ind[obj['category_id']]
if cls != 1:
continue
# ignore objs without keypoints annotation
if max(obj['keypoints']) == 0:
continue
joints_3d = np.zeros((self.num_joints, 3), dtype=np.float)
joints_3d_vis = np.zeros((self.num_joints, 3), dtype=np.float)
for ipt in range(self.num_joints):
joints_3d[ipt, 0] = obj['keypoints'][ipt * 3 + 0]
joints_3d[ipt, 1] = obj['keypoints'][ipt * 3 + 1]
joints_3d[ipt, 2] = 0
t_vis = obj['keypoints'][ipt * 3 + 2]
if t_vis > 1:
t_vis = 1
joints_3d_vis[ipt, 0] = t_vis
joints_3d_vis[ipt, 1] = t_vis
joints_3d_vis[ipt, 2] = 0
center, scale = self._box2cs(obj['clean_bbox'][:4])
rec.append({
'image': self.image_path_from_index(index),
'center': center,
'scale': scale,
'joints_3d': joints_3d,
'joints_3d_vis': joints_3d_vis,
'filename': '',
'imgnum': 0,
})
return rec
def _box2cs(self, box):
x, y, w, h = box[:4]
return self._xywh2cs(x, y, w, h)
def _xywh2cs(self, x, y, w, h):
center = np.zeros((2), dtype=np.float32)
center[0] = x + w * 0.5
center[1] = y + h * 0.5
if w > self.aspect_ratio * h:
h = w * 1.0 / self.aspect_ratio
elif w < self.aspect_ratio * h:
w = h * self.aspect_ratio
scale = np.array(
[w * 1.0 / self.pixel_std, h * 1.0 / self.pixel_std],
dtype=np.float32)
if center[0] != -1:
scale = scale * 1.25
return center, scale
def image_path_from_index(self, index):
""" example: images / train2017 / 000000119993.jpg """
file_name = '%012d.jpg' % index
if '2014' in self.image_set:
file_name = 'COCO_%s_' % self.image_set + file_name
prefix = 'test2017' if 'test' in self.image_set else self.image_set
data_name = prefix + '.zip@' if self.data_format == 'zip' else prefix
image_path = os.path.join(
self.root, 'images', data_name, file_name)
return image_path
def _load_coco_person_detection_results(self):
all_boxes = None
with open(self.bbox_file, 'r') as f:
all_boxes = json.load(f)
if not all_boxes:
logger.error('=> Load %s fail!' % self.bbox_file)
return None
logger.info('=> Total boxes: {}'.format(len(all_boxes)))
kpt_db = []
num_boxes = 0
for n_img in range(0, len(all_boxes)):
det_res = all_boxes[n_img]
if det_res['category_id'] != 1:
continue
img_name = self.image_path_from_index(det_res['image_id'])
box = det_res['bbox']
score = det_res['score']
if score < self.image_thre:
continue
num_boxes = num_boxes + 1
center, scale = self._box2cs(box)
joints_3d = np.zeros((self.num_joints, 3), dtype=np.float)
joints_3d_vis = np.ones(
(self.num_joints, 3), dtype=np.float)
kpt_db.append({
'image': img_name,
'center': center,
'scale': scale,
'score': score,
'joints_3d': joints_3d,
'joints_3d_vis': joints_3d_vis,
})
logger.info('=> Total boxes after fliter low score@{}: {}'.format(
self.image_thre, num_boxes))
return kpt_db
# need double check this API and classes field
def evaluate(self, cfg, preds, output_dir, all_boxes, img_path,
*args, **kwargs):
res_folder = os.path.join(output_dir, 'results')
if not os.path.exists(res_folder):
os.makedirs(res_folder)
res_file = os.path.join(
res_folder, 'keypoints_%s_results.json' % self.image_set)
# person x (keypoints)
_kpts = []
for idx, kpt in enumerate(preds):
_kpts.append({
'keypoints': kpt,
'center': all_boxes[idx][0:2],
'scale': all_boxes[idx][2:4],
'area': all_boxes[idx][4],
'score': all_boxes[idx][5],
'image': int(img_path[idx][-16:-4])
})
# image x person x (keypoints)
kpts = defaultdict(list)
for kpt in _kpts:
kpts[kpt['image']].append(kpt)
# rescoring and oks nms
num_joints = self.num_joints
in_vis_thre = self.in_vis_thre
oks_thre = self.oks_thre
oks_nmsed_kpts = []
for img in kpts.keys():
img_kpts = kpts[img]
for n_p in img_kpts:
box_score = n_p['score']
kpt_score = 0
valid_num = 0
for n_jt in range(0, num_joints):
t_s = n_p['keypoints'][n_jt][2]
if t_s > in_vis_thre:
kpt_score = kpt_score + t_s
valid_num = valid_num + 1
if valid_num != 0:
kpt_score = kpt_score / valid_num
# rescoring
n_p['score'] = kpt_score * box_score
keep = oks_nms([img_kpts[i] for i in range(len(img_kpts))],
oks_thre)
if len(keep) == 0:
oks_nmsed_kpts.append(img_kpts)
else:
oks_nmsed_kpts.append([img_kpts[_keep] for _keep in keep])
self._write_coco_keypoint_results(
oks_nmsed_kpts, res_file)
if 'test' not in self.image_set:
info_str = self._do_python_keypoint_eval(
res_file, res_folder)
name_value = OrderedDict(info_str)
return name_value, name_value['AP']
else:
return {'Null': 0}, 0
def _write_coco_keypoint_results(self, keypoints, res_file):
data_pack = [{'cat_id': self._class_to_coco_ind[cls],
'cls_ind': cls_ind,
'cls': cls,
'ann_type': 'keypoints',
'keypoints': keypoints
}
for cls_ind, cls in enumerate(self.classes) if not cls == '__background__']
results = self._coco_keypoint_results_one_category_kernel(data_pack[0])
logger.info('=> Writing results json to %s' % res_file)
with open(res_file, 'w') as f:
json.dump(results, f, sort_keys=True, indent=4)
try:
json.load(open(res_file))
except Exception:
content = []
with open(res_file, 'r') as f:
for line in f:
content.append(line)
content[-1] = ']'
with open(res_file, 'w') as f:
for c in content:
f.write(c)
def _coco_keypoint_results_one_category_kernel(self, data_pack):
cat_id = data_pack['cat_id']
keypoints = data_pack['keypoints']
cat_results = []
for img_kpts in keypoints:
if len(img_kpts) == 0:
continue
_key_points = np.array([img_kpts[k]['keypoints']
for k in range(len(img_kpts))])
key_points = np.zeros(
(_key_points.shape[0], self.num_joints * 3), dtype=np.float)
for ipt in range(self.num_joints):
key_points[:, ipt * 3 + 0] = _key_points[:, ipt, 0]
key_points[:, ipt * 3 + 1] = _key_points[:, ipt, 1]
key_points[:, ipt * 3 + 2] = _key_points[:, ipt, 2] # keypoints score.
result = [{'image_id': img_kpts[k]['image'],
'category_id': cat_id,
'keypoints': list(key_points[k]),
'score': img_kpts[k]['score'],
'center': list(img_kpts[k]['center']),
'scale': list(img_kpts[k]['scale'])
} for k in range(len(img_kpts))]
cat_results.extend(result)
return cat_results
def _do_python_keypoint_eval(self, res_file, res_folder):
coco_dt = self.coco.loadRes(res_file)
coco_eval = COCOeval(self.coco, coco_dt, 'keypoints')
coco_eval.params.useSegm = None
coco_eval.evaluate()
coco_eval.accumulate()
info_str = coco_eval.summarize()
eval_file = os.path.join(
res_folder, 'keypoints_%s_results.pkl' % self.image_set)
with open(eval_file, 'wb') as f:
pickle.dump(coco_eval, f, pickle.HIGHEST_PROTOCOL)
logger.info('=> coco eval results saved to %s' % eval_file)
return info_str

176
lib/dataset/mpii.py Normal file
Просмотреть файл

@ -0,0 +1,176 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
import logging
import os
import json_tricks as json
import numpy as np
from scipy.io import loadmat, savemat
from dataset.JointsDataset import JointsDataset
logger = logging.getLogger(__name__)
class MPIIDataset(JointsDataset):
def __init__(self, cfg, root, image_set, is_train, transform=None):
super().__init__(cfg, root, image_set, is_train, transform)
self.num_joints = 16
self.flip_pairs = [[0, 5], [1, 4], [2, 3], [10, 15], [11, 14], [12, 13]]
self.parent_ids = [1, 2, 6, 6, 3, 4, 6, 6, 7, 8, 11, 12, 7, 7, 13, 14]
self.db = self._get_db()
if is_train and cfg.DATASET.SELECT_DATA:
self.db = self.select_data(self.db)
logger.info('=> load {} samples'.format(len(self.db)))
def _get_db(self):
# create train/val split
file_name = os.path.join(self.root,
'annot',
self.image_set+'.json')
with open(file_name) as anno_file:
anno = json.load(anno_file)
gt_db = []
for a in anno:
image_name = a['image']
c = np.array(a['center'], dtype=np.float)
s = np.array([a['scale'], a['scale']], dtype=np.float)
# Adjust center/scale slightly to avoid cropping limbs
if c[0] != -1:
c[1] = c[1] + 15 * s[1]
s = s * 1.25
# MPII uses matlab format, index is based 1,
# we should first convert to 0-based index
c = c - 1
joints_3d = np.zeros((self.num_joints, 3), dtype=np.float)
joints_3d_vis = np.zeros((self.num_joints, 3), dtype=np.float)
if self.image_set != 'test':
joints = np.array(a['joints'])
joints[:, 0:2] = joints[:, 0:2] - 1
joints_vis = np.array(a['joints_vis'])
assert len(joints) == self.num_joints, \
'joint num diff: {} vs {}'.format(len(joints),
self.num_joints)
joints_3d[:, 0:2] = joints[:, 0:2]
joints_3d_vis[:, 0] = joints_vis[:]
joints_3d_vis[:, 1] = joints_vis[:]
image_dir = 'images.zip@' if self.data_format == 'zip' else 'images'
gt_db.append({
'image': os.path.join(self.root, image_dir, image_name),
'center': c,
'scale': s,
'joints_3d': joints_3d,
'joints_3d_vis': joints_3d_vis,
'filename': '',
'imgnum': 0,
})
return gt_db
def evaluate(self, cfg, preds, output_dir, *args, **kwargs):
# convert 0-based index to 1-based index
preds = preds[:, :, 0:2] + 1.0
if output_dir:
pred_file = os.path.join(output_dir, 'pred.mat')
savemat(pred_file, mdict={'preds': preds})
if 'test' in cfg.DATASET.TEST_SET:
return {'Null': 0.0}, 0.0
SC_BIAS = 0.6
threshold = 0.5
gt_file = os.path.join(cfg.DATASET.ROOT,
'annot',
'gt_{}.mat'.format(cfg.DATASET.TEST_SET))
gt_dict = loadmat(gt_file)
dataset_joints = gt_dict['dataset_joints']
jnt_missing = gt_dict['jnt_missing']
pos_gt_src = gt_dict['pos_gt_src']
headboxes_src = gt_dict['headboxes_src']
pos_pred_src = np.transpose(preds, [1, 2, 0])
head = np.where(dataset_joints == 'head')[1][0]
lsho = np.where(dataset_joints == 'lsho')[1][0]
lelb = np.where(dataset_joints == 'lelb')[1][0]
lwri = np.where(dataset_joints == 'lwri')[1][0]
lhip = np.where(dataset_joints == 'lhip')[1][0]
lkne = np.where(dataset_joints == 'lkne')[1][0]
lank = np.where(dataset_joints == 'lank')[1][0]
rsho = np.where(dataset_joints == 'rsho')[1][0]
relb = np.where(dataset_joints == 'relb')[1][0]
rwri = np.where(dataset_joints == 'rwri')[1][0]
rkne = np.where(dataset_joints == 'rkne')[1][0]
rank = np.where(dataset_joints == 'rank')[1][0]
rhip = np.where(dataset_joints == 'rhip')[1][0]
jnt_visible = 1 - jnt_missing
uv_error = pos_pred_src - pos_gt_src
uv_err = np.linalg.norm(uv_error, axis=1)
headsizes = headboxes_src[1, :, :] - headboxes_src[0, :, :]
headsizes = np.linalg.norm(headsizes, axis=0)
headsizes *= SC_BIAS
scale = np.multiply(headsizes, np.ones((len(uv_err), 1)))
scaled_uv_err = np.divide(uv_err, scale)
scaled_uv_err = np.multiply(scaled_uv_err, jnt_visible)
jnt_count = np.sum(jnt_visible, axis=1)
less_than_threshold = np.multiply((scaled_uv_err <= threshold),
jnt_visible)
PCKh = np.divide(100.*np.sum(less_than_threshold, axis=1), jnt_count)
# save
rng = np.arange(0, 0.5+0.01, 0.01)
pckAll = np.zeros((len(rng), 16))
for r in range(len(rng)):
threshold = rng[r]
less_than_threshold = np.multiply(scaled_uv_err <= threshold,
jnt_visible)
pckAll[r, :] = np.divide(100.*np.sum(less_than_threshold, axis=1),
jnt_count)
PCKh = np.ma.array(PCKh, mask=False)
PCKh.mask[6:8] = True
jnt_count = np.ma.array(jnt_count, mask=False)
jnt_count.mask[6:8] = True
jnt_ratio = jnt_count / np.sum(jnt_count).astype(np.float64)
name_value = [
('Head', PCKh[head]),
('Shoulder', 0.5 * (PCKh[lsho] + PCKh[rsho])),
('Elbow', 0.5 * (PCKh[lelb] + PCKh[relb])),
('Wrist', 0.5 * (PCKh[lwri] + PCKh[rwri])),
('Hip', 0.5 * (PCKh[lhip] + PCKh[rhip])),
('Knee', 0.5 * (PCKh[lkne] + PCKh[rkne])),
('Ankle', 0.5 * (PCKh[lank] + PCKh[rank])),
('Mean', np.sum(PCKh * jnt_ratio)),
('Mean@0.1', np.sum(pckAll[11, :] * jnt_ratio))
]
name_value = OrderedDict(name_value)
return name_value, name_value['Mean']

11
lib/models/__init__.py Normal file
Просмотреть файл

@ -0,0 +1,11 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import models.pose_resnet

257
lib/models/pose_resnet.py Normal file
Просмотреть файл

@ -0,0 +1,257 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import logging
import torch
import torch.nn as nn
BN_MOMENTUM = 0.1
logger = logging.getLogger(__name__)
def conv3x3(in_planes, out_planes, stride=1):
"""3x3 convolution with padding"""
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
padding=1, bias=False)
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,
bias=False)
self.bn3 = nn.BatchNorm2d(planes * self.expansion,
momentum=BN_MOMENTUM)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
class PoseResNet(nn.Module):
def __init__(self, block, layers, cfg, **kwargs):
self.inplanes = 64
extra = cfg.MODEL.EXTRA
self.deconv_with_bias = extra.DECONV_WITH_BIAS
super(PoseResNet, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
bias=False)
self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
# used for deconv layers
self.deconv_layers = self._make_deconv_layer(
extra.NUM_DECONV_LAYERS,
extra.NUM_DECONV_FILTERS,
extra.NUM_DECONV_KERNELS,
)
self.final_layer = nn.Conv2d(
in_channels=extra.NUM_DECONV_FILTERS[-1],
out_channels=cfg.MODEL.NUM_JOINTS,
kernel_size=extra.FINAL_CONV_KERNEL,
stride=1,
padding=1 if extra.FINAL_CONV_KERNEL == 3 else 0
)
def _make_layer(self, block, planes, blocks, stride=1):
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes * block.expansion,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
def _get_deconv_cfg(self, deconv_kernel, index):
if deconv_kernel == 4:
padding = 1
output_padding = 0
elif deconv_kernel == 3:
padding = 1
output_padding = 1
elif deconv_kernel == 2:
padding = 0
output_padding = 0
return deconv_kernel, padding, output_padding
def _make_deconv_layer(self, num_layers, num_filters, num_kernels):
assert num_layers == len(num_filters), \
'ERROR: num_deconv_layers is different len(num_deconv_filters)'
assert num_layers == len(num_kernels), \
'ERROR: num_deconv_layers is different len(num_deconv_filters)'
layers = []
for i in range(num_layers):
kernel, padding, output_padding = \
self._get_deconv_cfg(num_kernels[i], i)
planes = num_filters[i]
layers.append(
nn.ConvTranspose2d(
in_channels=self.inplanes,
out_channels=planes,
kernel_size=kernel,
stride=2,
padding=padding,
output_padding=output_padding,
bias=self.deconv_with_bias))
layers.append(nn.BatchNorm2d(planes, momentum=BN_MOMENTUM))
layers.append(nn.ReLU(inplace=True))
self.inplanes = planes
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.deconv_layers(x)
x = self.final_layer(x)
return x
def init_weights(self, pretrained=''):
if os.path.isfile(pretrained):
logger.info('=> init deconv weights from normal distribution')
for name, m in self.deconv_layers.named_modules():
if isinstance(m, nn.ConvTranspose2d):
logger.info('=> init {}.weight as normal(0, 0.001)'.format(name))
logger.info('=> init {}.bias as 0'.format(name))
nn.init.normal_(m.weight, std=0.001)
if self.deconv_with_bias:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):
logger.info('=> init {}.weight as 1'.format(name))
logger.info('=> init {}.bias as 0'.format(name))
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
logger.info('=> init final conv weights from normal distribution')
for m in self.final_layer.modules():
if isinstance(m, nn.Conv2d):
# nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
logger.info('=> init {}.weight as normal(0, 0.001)'.format(name))
logger.info('=> init {}.bias as 0'.format(name))
nn.init.normal_(m.weight, std=0.001)
nn.init.constant_(m.bias, 0)
pretrained_state_dict = torch.load(pretrained)
logger.info('=> loading pretrained model {}'.format(pretrained))
self.load_state_dict(pretrained_state_dict, strict=False)
else:
logger.error('=> imagenet pretrained model dose not exist')
logger.error('=> please download it first')
raise ValueError('imagenet pretrained model does not exist')
resnet_spec = {18: (BasicBlock, [2, 2, 2, 2]),
34: (BasicBlock, [3, 4, 6, 3]),
50: (Bottleneck, [3, 4, 6, 3]),
101: (Bottleneck, [3, 4, 23, 3]),
152: (Bottleneck, [3, 8, 36, 3])}
def get_pose_net(cfg, is_train, **kwargs):
num_layers = cfg.MODEL.EXTRA.NUM_LAYERS
block_class, layers = resnet_spec[num_layers]
model = PoseResNet(block_class, layers, cfg, **kwargs)
if is_train and cfg.MODEL.INIT_WEIGHTS:
model.init_weights(cfg.MODEL.PRETRAINED)
return model

0
lib/nms/__init__.py Normal file
Просмотреть файл

67
lib/nms/cpu_nms.pyx Normal file
Просмотреть файл

@ -0,0 +1,67 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Modified from py-faster-rcnn (https://github.com/rbgirshick/py-faster-rcnn)
# ------------------------------------------------------------------------------
import numpy as np
cimport numpy as np
cdef inline np.float32_t max(np.float32_t a, np.float32_t b):
return a if a >= b else b
cdef inline np.float32_t min(np.float32_t a, np.float32_t b):
return a if a <= b else b
def cpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh):
cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0]
cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1]
cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2]
cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3]
cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1)
cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1].astype('i')
cdef int ndets = dets.shape[0]
cdef np.ndarray[np.int_t, ndim=1] suppressed = \
np.zeros((ndets), dtype=np.int)
# nominal indices
cdef int _i, _j
# sorted indices
cdef int i, j
# temp variables for box i's (the box currently under consideration)
cdef np.float32_t ix1, iy1, ix2, iy2, iarea
# variables for computing overlap with box j (lower scoring box)
cdef np.float32_t xx1, yy1, xx2, yy2
cdef np.float32_t w, h
cdef np.float32_t inter, ovr
keep = []
for _i in range(ndets):
i = order[_i]
if suppressed[i] == 1:
continue
keep.append(i)
ix1 = x1[i]
iy1 = y1[i]
ix2 = x2[i]
iy2 = y2[i]
iarea = areas[i]
for _j in range(_i + 1, ndets):
j = order[_j]
if suppressed[j] == 1:
continue
xx1 = max(ix1, x1[j])
yy1 = max(iy1, y1[j])
xx2 = min(ix2, x2[j])
yy2 = min(iy2, y2[j])
w = max(0.0, xx2 - xx1 + 1)
h = max(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (iarea + areas[j] - inter)
if ovr >= thresh:
suppressed[j] = 1
return keep

2
lib/nms/gpu_nms.hpp Normal file
Просмотреть файл

@ -0,0 +1,2 @@
void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,
int boxes_dim, float nms_overlap_thresh, int device_id);

30
lib/nms/gpu_nms.pyx Normal file
Просмотреть файл

@ -0,0 +1,30 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Modified from py-faster-rcnn (https://github.com/rbgirshick/py-faster-rcnn)
# ------------------------------------------------------------------------------
import numpy as np
cimport numpy as np
assert sizeof(int) == sizeof(np.int32_t)
cdef extern from "gpu_nms.hpp":
void _nms(np.int32_t*, int*, np.float32_t*, int, int, float, int)
def gpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh,
np.int32_t device_id=0):
cdef int boxes_num = dets.shape[0]
cdef int boxes_dim = dets.shape[1]
cdef int num_out
cdef np.ndarray[np.int32_t, ndim=1] \
keep = np.zeros(boxes_num, dtype=np.int32)
cdef np.ndarray[np.float32_t, ndim=1] \
scores = dets[:, 4]
cdef np.ndarray[np.int32_t, ndim=1] \
order = scores.argsort()[::-1].astype(np.int32)
cdef np.ndarray[np.float32_t, ndim=2] \
sorted_dets = dets[order, :]
_nms(&keep[0], &num_out, &sorted_dets[0, 0], boxes_num, boxes_dim, thresh, device_id)
keep = keep[:num_out]
return list(order[keep])

123
lib/nms/nms.py Normal file
Просмотреть файл

@ -0,0 +1,123 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Modified from py-faster-rcnn (https://github.com/rbgirshick/py-faster-rcnn)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from .cpu_nms import cpu_nms
from .gpu_nms import gpu_nms
def py_nms_wrapper(thresh):
def _nms(dets):
return nms(dets, thresh)
return _nms
def cpu_nms_wrapper(thresh):
def _nms(dets):
return cpu_nms(dets, thresh)
return _nms
def gpu_nms_wrapper(thresh, device_id):
def _nms(dets):
return gpu_nms(dets, thresh, device_id)
return _nms
def nms(dets, thresh):
"""
greedily select boxes with high confidence and overlap with current maximum <= thresh
rule out overlap >= thresh
:param dets: [[x1, y1, x2, y2 score]]
:param thresh: retain overlap < thresh
:return: indexes to keep
"""
if dets.shape[0] == 0:
return []
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= thresh)[0]
order = order[inds + 1]
return keep
def oks_iou(g, d, a_g, a_d, sigmas=None, in_vis_thre=None):
if not isinstance(sigmas, np.ndarray):
sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89]) / 10.0
vars = (sigmas * 2) ** 2
xg = g[0::3]
yg = g[1::3]
vg = g[2::3]
ious = np.zeros((d.shape[0]))
for n_d in range(0, d.shape[0]):
xd = d[n_d, 0::3]
yd = d[n_d, 1::3]
vd = d[n_d, 2::3]
dx = xd - xg
dy = yd - yg
e = (dx ** 2 + dy ** 2) / vars / ((a_g + a_d[n_d]) / 2 + np.spacing(1)) / 2
if in_vis_thre is not None:
ind = list(vg > in_vis_thre) and list(vd > in_vis_thre)
e = e[ind]
ious[n_d] = np.sum(np.exp(-e)) / e.shape[0] if e.shape[0] != 0 else 0.0
return ious
def oks_nms(kpts_db, thresh, sigmas=None, in_vis_thre=None):
"""
greedily select boxes with high confidence and overlap with current maximum <= thresh
rule out overlap >= thresh, overlap = oks
:param kpts_db
:param thresh: retain overlap < thresh
:return: indexes to keep
"""
if len(kpts_db) == 0:
return []
scores = np.array([kpts_db[i]['score'] for i in range(len(kpts_db))])
kpts = np.array([kpts_db[i]['keypoints'].flatten() for i in range(len(kpts_db))])
areas = np.array([kpts_db[i]['area'] for i in range(len(kpts_db))])
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
oks_ovr = oks_iou(kpts[i], kpts[order[1:]], areas[i], areas[order[1:]], sigmas, in_vis_thre)
inds = np.where(oks_ovr <= thresh)[0]
order = order[inds + 1]
return keep

143
lib/nms/nms_kernel.cu Normal file
Просмотреть файл

@ -0,0 +1,143 @@
// ------------------------------------------------------------------
// Copyright (c) Microsoft
// Licensed under The MIT License
// Modified from MATLAB Faster R-CNN (https://github.com/shaoqingren/faster_rcnn)
// ------------------------------------------------------------------
#include "gpu_nms.hpp"
#include <vector>
#include <iostream>
#define CUDA_CHECK(condition) \
/* Code block avoids redefinition of cudaError_t error */ \
do { \
cudaError_t error = condition; \
if (error != cudaSuccess) { \
std::cout << cudaGetErrorString(error) << std::endl; \
} \
} while (0)
#define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0))
int const threadsPerBlock = sizeof(unsigned long long) * 8;
__device__ inline float devIoU(float const * const a, float const * const b) {
float left = max(a[0], b[0]), right = min(a[2], b[2]);
float top = max(a[1], b[1]), bottom = min(a[3], b[3]);
float width = max(right - left + 1, 0.f), height = max(bottom - top + 1, 0.f);
float interS = width * height;
float Sa = (a[2] - a[0] + 1) * (a[3] - a[1] + 1);
float Sb = (b[2] - b[0] + 1) * (b[3] - b[1] + 1);
return interS / (Sa + Sb - interS);
}
__global__ void nms_kernel(const int n_boxes, const float nms_overlap_thresh,
const float *dev_boxes, unsigned long long *dev_mask) {
const int row_start = blockIdx.y;
const int col_start = blockIdx.x;
// if (row_start > col_start) return;
const int row_size =
min(n_boxes - row_start * threadsPerBlock, threadsPerBlock);
const int col_size =
min(n_boxes - col_start * threadsPerBlock, threadsPerBlock);
__shared__ float block_boxes[threadsPerBlock * 5];
if (threadIdx.x < col_size) {
block_boxes[threadIdx.x * 5 + 0] =
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0];
block_boxes[threadIdx.x * 5 + 1] =
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1];
block_boxes[threadIdx.x * 5 + 2] =
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2];
block_boxes[threadIdx.x * 5 + 3] =
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3];
block_boxes[threadIdx.x * 5 + 4] =
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4];
}
__syncthreads();
if (threadIdx.x < row_size) {
const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;
const float *cur_box = dev_boxes + cur_box_idx * 5;
int i = 0;
unsigned long long t = 0;
int start = 0;
if (row_start == col_start) {
start = threadIdx.x + 1;
}
for (i = start; i < col_size; i++) {
if (devIoU(cur_box, block_boxes + i * 5) > nms_overlap_thresh) {
t |= 1ULL << i;
}
}
const int col_blocks = DIVUP(n_boxes, threadsPerBlock);
dev_mask[cur_box_idx * col_blocks + col_start] = t;
}
}
void _set_device(int device_id) {
int current_device;
CUDA_CHECK(cudaGetDevice(&current_device));
if (current_device == device_id) {
return;
}
// The call to cudaSetDevice must come before any calls to Get, which
// may perform initialization using the GPU.
CUDA_CHECK(cudaSetDevice(device_id));
}
void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,
int boxes_dim, float nms_overlap_thresh, int device_id) {
_set_device(device_id);
float* boxes_dev = NULL;
unsigned long long* mask_dev = NULL;
const int col_blocks = DIVUP(boxes_num, threadsPerBlock);
CUDA_CHECK(cudaMalloc(&boxes_dev,
boxes_num * boxes_dim * sizeof(float)));
CUDA_CHECK(cudaMemcpy(boxes_dev,
boxes_host,
boxes_num * boxes_dim * sizeof(float),
cudaMemcpyHostToDevice));
CUDA_CHECK(cudaMalloc(&mask_dev,
boxes_num * col_blocks * sizeof(unsigned long long)));
dim3 blocks(DIVUP(boxes_num, threadsPerBlock),
DIVUP(boxes_num, threadsPerBlock));
dim3 threads(threadsPerBlock);
nms_kernel<<<blocks, threads>>>(boxes_num,
nms_overlap_thresh,
boxes_dev,
mask_dev);
std::vector<unsigned long long> mask_host(boxes_num * col_blocks);
CUDA_CHECK(cudaMemcpy(&mask_host[0],
mask_dev,
sizeof(unsigned long long) * boxes_num * col_blocks,
cudaMemcpyDeviceToHost));
std::vector<unsigned long long> remv(col_blocks);
memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks);
int num_to_keep = 0;
for (int i = 0; i < boxes_num; i++) {
int nblock = i / threadsPerBlock;
int inblock = i % threadsPerBlock;
if (!(remv[nblock] & (1ULL << inblock))) {
keep_out[num_to_keep++] = i;
unsigned long long *p = &mask_host[0] + i * col_blocks;
for (int j = nblock; j < col_blocks; j++) {
remv[j] |= p[j];
}
}
}
*num_out = num_to_keep;
CUDA_CHECK(cudaFree(boxes_dev));
CUDA_CHECK(cudaFree(mask_dev));
}

140
lib/nms/setup.py Normal file
Просмотреть файл

@ -0,0 +1,140 @@
# --------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Modified from py-faster-rcnn (https://github.com/rbgirshick/py-faster-rcnn)
# --------------------------------------------------------
import os
from os.path import join as pjoin
from setuptools import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy as np
def find_in_path(name, path):
"Find a file in a search path"
# Adapted fom
# http://code.activestate.com/recipes/52224-find-a-file-given-a-search-path/
for dir in path.split(os.pathsep):
binpath = pjoin(dir, name)
if os.path.exists(binpath):
return os.path.abspath(binpath)
return None
def locate_cuda():
"""Locate the CUDA environment on the system
Returns a dict with keys 'home', 'nvcc', 'include', and 'lib64'
and values giving the absolute path to each directory.
Starts by looking for the CUDAHOME env variable. If not found, everything
is based on finding 'nvcc' in the PATH.
"""
# first check if the CUDAHOME env variable is in use
if 'CUDAHOME' in os.environ:
home = os.environ['CUDAHOME']
nvcc = pjoin(home, 'bin', 'nvcc')
else:
# otherwise, search the PATH for NVCC
default_path = pjoin(os.sep, 'usr', 'local', 'cuda', 'bin')
nvcc = find_in_path('nvcc', os.environ['PATH'] + os.pathsep + default_path)
if nvcc is None:
raise EnvironmentError('The nvcc binary could not be '
'located in your $PATH. Either add it to your path, or set $CUDAHOME')
home = os.path.dirname(os.path.dirname(nvcc))
cudaconfig = {'home':home, 'nvcc':nvcc,
'include': pjoin(home, 'include'),
'lib64': pjoin(home, 'lib64')}
for k, v in cudaconfig.items():
if not os.path.exists(v):
raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v))
return cudaconfig
CUDA = locate_cuda()
# Obtain the numpy include directory. This logic works across numpy versions.
try:
numpy_include = np.get_include()
except AttributeError:
numpy_include = np.get_numpy_include()
def customize_compiler_for_nvcc(self):
"""inject deep into distutils to customize how the dispatch
to gcc/nvcc works.
If you subclass UnixCCompiler, it's not trivial to get your subclass
injected in, and still have the right customizations (i.e.
distutils.sysconfig.customize_compiler) run on it. So instead of going
the OO route, I have this. Note, it's kindof like a wierd functional
subclassing going on."""
# tell the compiler it can processes .cu
self.src_extensions.append('.cu')
# save references to the default compiler_so and _comple methods
default_compiler_so = self.compiler_so
super = self._compile
# now redefine the _compile method. This gets executed for each
# object but distutils doesn't have the ability to change compilers
# based on source extension: we add it.
def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts):
if os.path.splitext(src)[1] == '.cu':
# use the cuda for .cu files
self.set_executable('compiler_so', CUDA['nvcc'])
# use only a subset of the extra_postargs, which are 1-1 translated
# from the extra_compile_args in the Extension class
postargs = extra_postargs['nvcc']
else:
postargs = extra_postargs['gcc']
super(obj, src, ext, cc_args, postargs, pp_opts)
# reset the default compiler_so, which we might have changed for cuda
self.compiler_so = default_compiler_so
# inject our redefined _compile method into the class
self._compile = _compile
# run the customize_compiler
class custom_build_ext(build_ext):
def build_extensions(self):
customize_compiler_for_nvcc(self.compiler)
build_ext.build_extensions(self)
ext_modules = [
Extension(
"cpu_nms",
["cpu_nms.pyx"],
extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]},
include_dirs = [numpy_include]
),
Extension('gpu_nms',
['nms_kernel.cu', 'gpu_nms.pyx'],
library_dirs=[CUDA['lib64']],
libraries=['cudart'],
language='c++',
runtime_library_dirs=[CUDA['lib64']],
# this syntax is specific to this build system
# we're only going to use certain compiler args with nvcc and not with
# gcc the implementation of this trick is in customize_compiler() below
extra_compile_args={'gcc': ["-Wno-unused-function"],
'nvcc': ['-arch=sm_35',
'--ptxas-options=-v',
'-c',
'--compiler-options',
"'-fPIC'"]},
include_dirs = [numpy_include, CUDA['include']]
),
]
setup(
name='nms',
ext_modules=ext_modules,
# inject our custom trigger
cmdclass={'build_ext': custom_build_ext},
)

0
lib/utils/__init__.py Normal file
Просмотреть файл

123
lib/utils/transforms.py Normal file
Просмотреть файл

@ -0,0 +1,123 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import cv2
def flip_back(output_flipped, matched_parts):
'''
ouput_flipped: numpy.ndarray(batch_size, num_joints, height, width)
'''
assert output_flipped.ndim == 4,\
'output_flipped should be [batch_size, num_joints, height, width]'
output_flipped = output_flipped[:, :, :, ::-1]
for pair in matched_parts:
tmp = output_flipped[:, pair[0], :, :].copy()
output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :]
output_flipped[:, pair[1], :, :] = tmp
return output_flipped
def fliplr_joints(joints, joints_vis, width, matched_parts):
"""
flip coords
"""
# Flip horizontal
joints[:, 0] = width - joints[:, 0] - 1
# Change left-right parts
for pair in matched_parts:
joints[pair[0], :], joints[pair[1], :] = \
joints[pair[1], :], joints[pair[0], :].copy()
joints_vis[pair[0], :], joints_vis[pair[1], :] = \
joints_vis[pair[1], :], joints_vis[pair[0], :].copy()
return joints*joints_vis, joints_vis
def transform_preds(coords, center, scale, output_size):
target_coords = np.zeros(coords.shape)
trans = get_affine_transform(center, scale, 0, output_size, inv=1)
for p in range(coords.shape[0]):
target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
return target_coords
def get_affine_transform(center,
scale,
rot,
output_size,
shift=np.array([0, 0], dtype=np.float32),
inv=0):
if not isinstance(scale, np.ndarray) and not isinstance(scale, list):
print(scale)
scale = np.array([scale, scale])
scale_tmp = scale * 200.0
src_w = scale_tmp[0]
dst_w = output_size[0]
dst_h = output_size[1]
rot_rad = np.pi * rot / 180
src_dir = get_dir([0, src_w * -0.5], rot_rad)
dst_dir = np.array([0, dst_w * -0.5], np.float32)
src = np.zeros((3, 2), dtype=np.float32)
dst = np.zeros((3, 2), dtype=np.float32)
src[0, :] = center + scale_tmp * shift
src[1, :] = center + src_dir + scale_tmp * shift
dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
src[2:, :] = get_3rd_point(src[0, :], src[1, :])
dst[2:, :] = get_3rd_point(dst[0, :], dst[1, :])
if inv:
trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
else:
trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
return trans
def affine_transform(pt, t):
new_pt = np.array([pt[0], pt[1], 1.]).T
new_pt = np.dot(t, new_pt)
return new_pt[:2]
def get_3rd_point(a, b):
direct = a - b
return b + np.array([-direct[1], direct[0]], dtype=np.float32)
def get_dir(src_point, rot_rad):
sn, cs = np.sin(rot_rad), np.cos(rot_rad)
src_result = [0, 0]
src_result[0] = src_point[0] * cs - src_point[1] * sn
src_result[1] = src_point[0] * sn + src_point[1] * cs
return src_result
def crop(img, center, scale, output_size, rot=0):
trans = get_affine_transform(center, scale, rot, output_size)
dst_img = cv2.warpAffine(img,
trans,
(int(output_size[0]), int(output_size[1])),
flags=cv2.INTER_LINEAR)
return dst_img

83
lib/utils/utils.py Normal file
Просмотреть файл

@ -0,0 +1,83 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import logging
import time
from pathlib import Path
import torch
import torch.optim as optim
from core.config import get_model_name
def create_logger(cfg, cfg_name, phase='train'):
root_output_dir = Path(cfg.OUTPUT_DIR)
# set up logger
if not root_output_dir.exists():
print('=> creating {}'.format(root_output_dir))
root_output_dir.mkdir()
dataset = cfg.DATASET.DATASET + '_' + cfg.DATASET.HYBRID_JOINTS_TYPE \
if cfg.DATASET.HYBRID_JOINTS_TYPE else cfg.DATASET.DATASET
dataset = dataset.replace(':', '_')
model, _ = get_model_name(cfg)
cfg_name = os.path.basename(cfg_name).split('.')[0]
final_output_dir = root_output_dir / dataset / model / cfg_name
print('=> creating {}'.format(final_output_dir))
final_output_dir.mkdir(parents=True, exist_ok=True)
time_str = time.strftime('%Y-%m-%d-%H-%M')
log_file = '{}_{}_{}.log'.format(cfg_name, time_str, phase)
final_log_file = final_output_dir / log_file
head = '%(asctime)-15s %(message)s'
logging.basicConfig(filename=str(final_log_file),
format=head)
logger = logging.getLogger()
logger.setLevel(logging.INFO)
console = logging.StreamHandler()
logging.getLogger('').addHandler(console)
tensorboard_log_dir = Path(cfg.LOG_DIR) / dataset / model / \
(cfg_name + '_' + time_str)
print('=> creating {}'.format(tensorboard_log_dir))
tensorboard_log_dir.mkdir(parents=True, exist_ok=True)
return logger, str(final_output_dir), str(tensorboard_log_dir)
def get_optimizer(cfg, model):
optimizer = None
if cfg.TRAIN.OPTIMIZER == 'sgd':
optimizer = optim.SGD(
model.parameters(),
lr=cfg.TRAIN.LR,
momentum=cfg.TRAIN.MOMENTUM,
weight_decay=cfg.TRAIN.WD,
nesterov=cfg.TRAIN.NESTEROV
)
elif cfg.TRAIN.OPTIMIZER == 'adam':
optimizer = optim.Adam(
model.parameters(),
lr=cfg.TRAIN.LR
)
return optimizer
def save_checkpoint(states, is_best, output_dir,
filename='checkpoint.pth.tar'):
torch.save(states, os.path.join(output_dir, filename))
if is_best and 'state_dict' in states:
torch.save(states['state_dict'],
os.path.join(output_dir, 'model_best.pth.tar'))

141
lib/utils/vis.py Executable file
Просмотреть файл

@ -0,0 +1,141 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import numpy as np
import torchvision
import cv2
from core.inference import get_max_preds
def save_batch_image_with_joints(batch_image, batch_joints, batch_joints_vis,
file_name, nrow=8, padding=2):
'''
batch_image: [batch_size, channel, height, width]
batch_joints: [batch_size, num_joints, 3],
batch_joints_vis: [batch_size, num_joints, 1],
}
'''
grid = torchvision.utils.make_grid(batch_image, nrow, padding, True)
ndarr = grid.mul(255).clamp(0, 255).byte().permute(1, 2, 0).cpu().numpy()
ndarr = ndarr.copy()
nmaps = batch_image.size(0)
xmaps = min(nrow, nmaps)
ymaps = int(math.ceil(float(nmaps) / xmaps))
height = int(batch_image.size(2) + padding)
width = int(batch_image.size(3) + padding)
k = 0
for y in range(ymaps):
for x in range(xmaps):
if k >= nmaps:
break
joints = batch_joints[k]
joints_vis = batch_joints_vis[k]
for joint, joint_vis in zip(joints, joints_vis):
joint[0] = x * width + padding + joint[0]
joint[1] = y * height + padding + joint[1]
if joint_vis[0]:
cv2.circle(ndarr, (int(joint[0]), int(joint[1])), 2, [255, 0, 0], 2)
k = k + 1
cv2.imwrite(file_name, ndarr)
def save_batch_heatmaps(batch_image, batch_heatmaps, file_name,
normalize=True):
'''
batch_image: [batch_size, channel, height, width]
batch_heatmaps: ['batch_size, num_joints, height, width]
file_name: saved file name
'''
if normalize:
batch_image = batch_image.clone()
min = float(batch_image.min())
max = float(batch_image.max())
batch_image.add_(-min).div_(max - min + 1e-5)
batch_size = batch_heatmaps.size(0)
num_joints = batch_heatmaps.size(1)
heatmap_height = batch_heatmaps.size(2)
heatmap_width = batch_heatmaps.size(3)
grid_image = np.zeros((batch_size*heatmap_height,
(num_joints+1)*heatmap_width,
3),
dtype=np.uint8)
preds, maxvals = get_max_preds(batch_heatmaps.detach().cpu().numpy())
for i in range(batch_size):
image = batch_image[i].mul(255)\
.clamp(0, 255)\
.byte()\
.permute(1, 2, 0)\
.cpu().numpy()
heatmaps = batch_heatmaps[i].mul(255)\
.clamp(0, 255)\
.byte()\
.cpu().numpy()
resized_image = cv2.resize(image,
(int(heatmap_width), int(heatmap_height)))
height_begin = heatmap_height * i
height_end = heatmap_height * (i + 1)
for j in range(num_joints):
cv2.circle(resized_image,
(int(preds[i][j][0]), int(preds[i][j][1])),
1, [0, 0, 255], 1)
heatmap = heatmaps[j, :, :]
colored_heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
masked_image = colored_heatmap*0.7 + resized_image*0.3
cv2.circle(masked_image,
(int(preds[i][j][0]), int(preds[i][j][1])),
1, [0, 0, 255], 1)
width_begin = heatmap_width * (j+1)
width_end = heatmap_width * (j+2)
grid_image[height_begin:height_end, width_begin:width_end, :] = \
masked_image
# grid_image[height_begin:height_end, width_begin:width_end, :] = \
# colored_heatmap*0.7 + resized_image*0.3
grid_image[height_begin:height_end, 0:heatmap_width, :] = resized_image
cv2.imwrite(file_name, grid_image)
def save_debug_images(config, input, meta, target, joints_pred, output,
prefix):
if not config.DEBUG.DEBUG:
return
if config.DEBUG.SAVE_BATCH_IMAGES_GT:
save_batch_image_with_joints(
input, meta['joints'], meta['joints_vis'],
'{}_gt.jpg'.format(prefix)
)
if config.DEBUG.SAVE_BATCH_IMAGES_PRED:
save_batch_image_with_joints(
input, joints_pred, meta['joints_vis'],
'{}_pred.jpg'.format(prefix)
)
if config.DEBUG.SAVE_HEATMAPS_GT:
save_batch_heatmaps(
input, target, '{}_hm_gt.jpg'.format(prefix)
)
if config.DEBUG.SAVE_HEATMAPS_PRED:
save_batch_heatmaps(
input, output, '{}_hm_pred.jpg'.format(prefix)
)

70
lib/utils/zipreader.py Normal file
Просмотреть файл

@ -0,0 +1,70 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import zipfile
import xml.etree.ElementTree as ET
import cv2
import numpy as np
_im_zfile = []
_xml_path_zip = []
_xml_zfile = []
def imread(filename, flags=cv2.IMREAD_COLOR):
global _im_zfile
path = filename
pos_at = path.index('@')
if pos_at == -1:
print("character '@' is not found from the given path '%s'"%(path))
assert 0
path_zip = path[0: pos_at]
path_img = path[pos_at + 2:]
if not os.path.isfile(path_zip):
print("zip file '%s' is not found"%(path_zip))
assert 0
for i in range(len(_im_zfile)):
if _im_zfile[i]['path'] == path_zip:
data = _im_zfile[i]['zipfile'].read(path_img)
return cv2.imdecode(np.frombuffer(data, np.uint8), flags)
_im_zfile.append({
'path': path_zip,
'zipfile': zipfile.ZipFile(path_zip, 'r')
})
data = _im_zfile[-1]['zipfile'].read(path_img)
return cv2.imdecode(np.frombuffer(data, np.uint8), flags)
def xmlread(filename):
global _xml_path_zip
global _xml_zfile
path = filename
pos_at = path.index('@')
if pos_at == -1:
print("character '@' is not found from the given path '%s'"%(path))
assert 0
path_zip = path[0: pos_at]
path_xml = path[pos_at + 2:]
if not os.path.isfile(path_zip):
print("zip file '%s' is not found"%(path_zip))
assert 0
for i in range(len(_xml_path_zip)):
if _xml_path_zip[i] == path_zip:
data = _xml_zfile[i].open(path_xml)
return ET.fromstring(data.read())
_xml_path_zip.append(path_zip)
print("read new xml file '%s'"%(path_zip))
_xml_zfile.append(zipfile.ZipFile(path_zip, 'r'))
data = _xml_zfile[-1].open(path_xml)
return ET.fromstring(data.read())

Просмотреть файл

@ -0,0 +1,23 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os.path as osp
import sys
def add_path(path):
if path not in sys.path:
sys.path.insert(0, path)
this_dir = osp.dirname(__file__)
lib_path = osp.join(this_dir, '..', 'lib')
add_path(lib_path)

206
pose_estimation/train.py Executable file
Просмотреть файл

@ -0,0 +1,206 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import pprint
import shutil
import torch
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
from tensorboardX import SummaryWriter
import _init_paths
from core.config import config
from core.config import update_config
from core.config import update_dir
from core.config import get_model_name
from core.loss import JointsMSELoss
from core.function import train
from core.function import validate
from utils.utils import get_optimizer
from utils.utils import save_checkpoint
from utils.utils import create_logger
import dataset
import models
def parse_args():
parser = argparse.ArgumentParser(description='Train keypoints network')
# general
parser.add_argument('--cfg',
help='experiment configure file name',
required=True,
type=str)
args, rest = parser.parse_known_args()
# update config
update_config(args.cfg)
# training
parser.add_argument('--frequent',
help='frequency of logging',
default=config.PRINT_FREQ,
type=int)
parser.add_argument('--gpus',
help='gpus',
type=str)
parser.add_argument('--workers',
help='num of dataloader workers',
type=int)
args = parser.parse_args()
return args
def reset_config(config, args):
if args.gpus:
config.GPUS = args.gpus
if args.workers:
config.WORKERS = args.workers
def main():
args = parse_args()
reset_config(config, args)
logger, final_output_dir, tb_log_dir = create_logger(
config, args.cfg, 'train')
logger.info(pprint.pformat(args))
logger.info(pprint.pformat(config))
# cudnn related setting
cudnn.benchmark = config.CUDNN.BENCHMARK
torch.backends.cudnn.deterministic = config.CUDNN.DETERMINISTIC
torch.backends.cudnn.enabled = config.CUDNN.ENABLED
model = eval('models.'+config.MODEL.NAME+'.get_pose_net')(
config, is_train=True
)
# copy model file
this_dir = os.path.dirname(__file__)
shutil.copy2(
os.path.join(this_dir, '../lib/models', config.MODEL.NAME + '.py'),
final_output_dir)
writer_dict = {
'writer': SummaryWriter(log_dir=tb_log_dir),
'train_global_steps': 0,
'valid_global_steps': 0,
}
dump_input = torch.rand((config.TRAIN.BATCH_SIZE,
config.MODEL.NUM_JOINTS,
config.MODEL.IMAGE_SIZE[1],
config.MODEL.IMAGE_SIZE[0]))
writer_dict['writer'].add_graph(model, (dump_input, ))
gpus = [int(i) for i in config.GPUS.split(',')]
model = torch.nn.DataParallel(model, device_ids=gpus).cuda()
# define loss function (criterion) and optimizer
criterion = JointsMSELoss(
use_target_weight=config.LOSS.USE_TARGET_WEIGHT
).cuda()
optimizer = get_optimizer(config, model)
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(
optimizer, config.TRAIN.LR_STEP, config.TRAIN.LR_FACTOR
)
# Data loading code
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
train_dataset = eval('dataset.'+config.DATASET.DATASET)(
config,
config.DATASET.ROOT,
config.DATASET.TRAIN_SET,
True,
transforms.Compose([
transforms.ToTensor(),
normalize,
])
)
valid_dataset = eval('dataset.'+config.DATASET.DATASET)(
config,
config.DATASET.ROOT,
config.DATASET.TEST_SET,
False,
transforms.Compose([
transforms.ToTensor(),
normalize,
])
)
train_loader = torch.utils.data.DataLoader(
train_dataset,
batch_size=config.TRAIN.BATCH_SIZE*len(gpus),
shuffle=config.TRAIN.SHUFFLE,
num_workers=config.WORKERS,
pin_memory=True
)
valid_loader = torch.utils.data.DataLoader(
valid_dataset,
batch_size=config.TEST.BATCH_SIZE*len(gpus),
shuffle=False,
num_workers=config.WORKERS,
pin_memory=True
)
best_perf = 0.0
best_model = False
for epoch in range(config.TRAIN.BEGIN_EPOCH, config.TRAIN.END_EPOCH):
lr_scheduler.step()
# train for one epoch
train(config, train_loader, model, criterion, optimizer, epoch,
final_output_dir, tb_log_dir, writer_dict)
# evaluate on validation set
perf_indicator = validate(config, valid_loader, valid_dataset, model,
criterion, final_output_dir, tb_log_dir,
writer_dict)
if perf_indicator > best_perf:
best_perf = perf_indicator
best_model = True
else:
best_model = False
logger.info('=> saving checkpoint to {}'.format(final_output_dir))
save_checkpoint({
'epoch': epoch + 1,
'model': get_model_name(config),
'state_dict': model.state_dict(),
'perf': perf_indicator,
'optimizer': optimizer.state_dict(),
}, best_model, final_output_dir)
final_model_state_file = os.path.join(final_output_dir,
'final_state.pth.tar')
logger.info('saving final model state to {}'.format(
final_model_state_file))
torch.save(model.module.state_dict(), final_model_state_file)
writer_dict['writer'].close()
if __name__ == '__main__':
main()

165
pose_estimation/valid.py Executable file
Просмотреть файл

@ -0,0 +1,165 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import pprint
import torch
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
import _init_paths
from core.config import config
from core.config import update_config
from core.config import update_dir
from core.loss import JointsMSELoss
from core.function import validate
from utils.utils import create_logger
import dataset
import models
def parse_args():
parser = argparse.ArgumentParser(description='Train keypoints network')
# general
parser.add_argument('--cfg',
help='experiment configure file name',
required=True,
type=str)
args, rest = parser.parse_known_args()
# update config
update_config(args.cfg)
# training
parser.add_argument('--frequent',
help='frequency of logging',
default=config.PRINT_FREQ,
type=int)
parser.add_argument('--gpus',
help='gpus',
type=str)
parser.add_argument('--workers',
help='num of dataloader workers',
type=int)
parser.add_argument('--model-file',
help='model state file',
type=str)
parser.add_argument('--use-detect-bbox',
help='use detect bbox',
action='store_true')
parser.add_argument('--flip-test',
help='use flip test',
action='store_true')
parser.add_argument('--post-process',
help='use post process',
action='store_true')
parser.add_argument('--shift-heatmap',
help='shift heatmap',
action='store_true')
parser.add_argument('--coco-bbox-file',
help='coco detection bbox file',
type=str)
args = parser.parse_args()
return args
def reset_config(config, args):
if args.gpus:
config.GPUS = args.gpus
if args.workers:
config.WORKERS = args.workers
if args.use_detect_bbox:
config.TEST.USE_GT_BBOX = not args.use_detect_bbox
if args.flip_test:
config.TEST.FLIP_TEST = args.flip_test
if args.post_process:
config.TEST.POST_PROCESS = args.post_process
if args.shift_heatmap:
config.TEST.SHIFT_HEATMAP = args.shift_heatmap
if args.model_file:
config.TEST.MODEL_FILE = args.model_file
if args.coco_bbox_file:
config.TEST.COCO_BBOX_FILE = args.coco_bbox_file
def main():
args = parse_args()
reset_config(config, args)
logger, final_output_dir, tb_log_dir = create_logger(
config, args.cfg, 'valid')
logger.info(pprint.pformat(args))
logger.info(pprint.pformat(config))
# cudnn related setting
cudnn.benchmark = config.CUDNN.BENCHMARK
torch.backends.cudnn.deterministic = config.CUDNN.DETERMINISTIC
torch.backends.cudnn.enabled = config.CUDNN.ENABLED
model = eval('models.'+config.MODEL.NAME+'.get_pose_net')(
config, is_train=False
)
if config.TEST.MODEL_FILE:
logger.info('=> loading model from {}'.format(config.TEST.MODEL_FILE))
model.load_state_dict(torch.load(config.TEST.MODEL_FILE))
else:
model_state_file = os.path.join(final_output_dir,
'final_state.pth.tar')
logger.info('=> loading model from {}'.format(model_state_file))
model.load_state_dict(torch.load(model_state_file))
gpus = [int(i) for i in config.GPUS.split(',')]
model = torch.nn.DataParallel(model, device_ids=gpus).cuda()
# define loss function (criterion) and optimizer
criterion = JointsMSELoss(
use_target_weight=config.LOSS.USE_TARGET_WEIGHT
).cuda()
# Data loading code
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
valid_dataset = eval('dataset.'+config.DATASET.DATASET)(
config,
config.DATASET.ROOT,
config.DATASET.TEST_SET,
False,
transforms.Compose([
transforms.ToTensor(),
normalize,
])
)
valid_loader = torch.utils.data.DataLoader(
valid_dataset,
batch_size=config.TEST.BATCH_SIZE*len(gpus),
shuffle=False,
num_workers=config.WORKERS,
pin_memory=True
)
# evaluate on validation set
validate(config, valid_loader, valid_dataset, model, criterion,
final_output_dir, tb_log_dir)
if __name__ == '__main__':
main()

9
requirements.txt Normal file
Просмотреть файл

@ -0,0 +1,9 @@
EasyDict==1.7
opencv-python==3.4.1.15
Cython
scipy
pandas
pyyaml
json_tricks
scikit-image
tensorboardX