b317d88e6b
This pr is auto merged as it contains a mandatory file and is opened for more than 10 days. |
||
---|---|---|
example_texture_generation | ||
torchreid | ||
.gitignore | ||
CODE_OF_CONDUCT.md | ||
LICENSE | ||
NOTICE.txt | ||
PIT_dataset.png | ||
README.md | ||
SECURITY.md | ||
args.py | ||
generate_texture.py | ||
main.py | ||
pipeline.png | ||
requirements.txt |
README.md
Introduction
This repository holds the codes and methods for the following AAAI-2020 paper:
Person re-identification (reID) aims to match person images to retrieve the ones with the same identity. Note that this work is targeted for the applications of finding lost child, and the customer density analysis in retail stores. Person reID is a challenging task, as the images to be matched are generally semantically misaligned due to the diversity of human poses and capture viewpoints, incompleteness of the visible bodies (due to occlusion), etc.
We propose a framework that drives the reID network to learn semantics-aligned feature representation through delicate supervision designs. Specifically, we build a Semantics Aligning Network (SAN) which consists of a base network as encoder (SA-Enc) for re-ID, and a decoder (SA-Dec) for reconstructing/regressing the densely semantics aligned full texture image. We jointly train the SAN under the supervisions of person re-identification and aligned texture generation. Moreover, at the decoder, besides the reconstruction loss, we add Triplet ReID constraints over the feature maps as the perceptual losses. The decoder is discarded in the inference and thus our scheme is computationally efficient. Our design significantly outperforms the performance of baseline and achieve the state-of-the-art performance.
Figure 1: Illustration of the proposed Semantics Aligning Network (SAN). It consists of a base network as encoder (SA-Enc) and a decoder sub-network (SA-Dec). The reID feature vector is obtained by average pooling the feature map of the SA-Enc, followed by the reID losses. To encourage the encoder learning semantically aligned features, the SA-Dec is followed which regresses the densely semantically aligned full texture image with the pseudo groundtruth supervision. In inference, the SA-Dec is discarded.
Synthesized Paired-Image-Texture Dataset (PIT Dataset)
To train the SAN-PG, we synthesize a Paired-Image-Texture dataset (PIT dataset), based on SURREAL dataset, for the purpose of providing the image pairs, i.e., the person image and its texture image. The texture image stores the RGB texture of the full person 3D surface. In particular, we use 929 raster-scanned texture maps provided by the SURREAL dataset to generate the image pairs. On SURREAL, all faces in the texture image are replaced by an average face of either man or woman. We generate 9,290 different meshes of diverse poses/shapes/viewpoints. For each texture map, we assign 10 different meshes and render these 3D meshes with the texture image. Then we obtain in total 9,290 different synthesized (person image, texture image) pairs. To simulate real-world scenes, the background images for rendering are randomly sampled from COCO dataset. Each synthetic person image is centered on a person with resolution 256x128. The resolution of the texture images is 256x256. The PIT dataset can be downloaded from here.
Figure 2: Examples of texture images (first row) and the corresponding synthesized person images with different poses, viewpoints, and backgrounds (second row). A texture image represents the full texture of the 3D human surface in a surface-based canonical coordinate system (UV space). Each position (u,v) corresponds to a unique semantic identity. For person images of different persons/poses/viewpoints (in the second row), their corresponding texture images are densely semantically aligned.
Installation
- Git clone this repo.
- Install dependencies by
pip install -r requirements.txt
(if necessary). - To install the cython-based evaluation toolbox,
cd
totorchreid/eval_cylib
and domake
. As a result,eval_metrics_cy.so
is generated under the same folder. Runpython test_cython.py
to test if the toolbox is installed successfully. (credit to luzai)
ReID Dataset Preparation
Here we use the CUHK03 dataset as an example for description. See torchreid/datasets/__init__.py for details. The data managers of image reID are implemented in torchreid/data_manager.py.
- Create a folder named
cuhk03/
under/YOUR_DATASET_PATH/
. - Download dataset to
data/cuhk03/
from http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html and extractcuhk03_release.zip
, so you will havedata/cuhk03/cuhk03_release
. - Download the train/test split protocal from person-re-ranking. What you need are
cuhk03_new_protocol_config_detected.mat
andcuhk03_new_protocol_config_labeled.mat
. Put the two mat files underdata/cuhk03
. Finally, the data structure would look like
cuhk03/
cuhk03_release/
cuhk03_new_protocol_config_detected.mat
cuhk03_new_protocol_config_labeled.mat
...
- Use
-d cuhk03
when running the training code. In the default mode, we use the new split protocal (767/700). In addition, here we use bothlabeled
modes. Please specify--cuhk03-labeled
to train and test onlabeled
images.
Pseudo Groundtruth Texture Images Generation
We train a network for the purpose of generating pseudo groundtruth texture images for any given input person image. For simplicity, we reuse a simplified SAN (i.e., SAN-PG) which consists of the SA-Enc and SA-Dec, but with only the reconstruction loss. We train the SAN-PG with our synthesized PIT dataset. The SAN-PG model is then used to generate pseudo groundtruth texture image for reID dataset.
Here we provide the pre-trained weight for SAN-PG and the corresponding pseudo texture images generation script generate_texture.py
, you can generate the pseudo texture images of your given person images by running:
python generate_texture.py -m /DOWNLOADED_SAN-PG_WEIGHTS -i example_results/input -o example_results/texture
For convenience, we also provide our generated pseudo groundtruth texture images for CUHK03 (Labeled), that is texture_cuhk03_labeled.
- Place these generated pseudo groundtruth texture images of the CUHK03 dataset to /YOUR_DATASET_PATH/cuhk03/.
- Finally, the data structure would look like
cuhk03/
cuhk03_release/
cuhk03_new_protocol_config_detected.mat
cuhk03_new_protocol_config_labeled.mat
texture_cuhk03_labeled
...
Train and Evaluation
python main.py \
--root DATASET_PATH \
-s cuhk03 \
-t cuhk03
--height 256 \
--width 128 \
--optim amsgrad \
--label-smooth \
--lr 8e-04 \
--max-epoch 300 \
--stepsize 40 80 120 160 200 240 280 \
--train-batch-size 64 \
--test-batch-size 100
-a resnet50_fc512 \
--save-dir SAVE_PATH \
--gpu-devices 0 \
--train-sampler RandomIdentitySampler \
--warm-up-epoch 20 \
--cuhk03-labeled \
--eval-freq 80
Reference
If you find our technique and repo useful, please cite our paper. Thanks!
@article{jin2020semantics,
title={Semantics-aligned representation learning for person re-identification},
author={Jin, Xin and Lan, Cuiling and Zeng, Wenjun and Wei, Guoqiang and Chen, Zhibo},
journal={AAAI},
year={2020}
}
Microsoft Open Source Code of Conduct: https://opensource.microsoft.com/codeofconduct
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.