Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019)
Перейти к файлу
aadha3 b8998d7f5a
np.squeeze expects int or tuple
see:https://docs.scipy.org/doc/numpy/reference/generated/numpy.squeeze.html
2019-09-01 00:31:11 +02:00
BFM add selected vertex index 2019-04-08 12:36:50 +08:00
__pycache__ version without renderer 2019-03-29 12:39:27 +08:00
images . 2019-03-29 19:33:08 +08:00
input version without renderer 2019-03-29 12:39:27 +08:00
network . 2019-03-29 16:45:09 +08:00
output . 2019-03-29 16:45:09 +08:00
.gitignore test demo 2019-03-29 19:46:46 +08:00
LICENSE test demo 2019-03-29 19:46:46 +08:00
demo.py np.squeeze expects int or tuple 2019-09-01 00:31:11 +02:00
load_data.py version without renderer 2019-03-29 12:39:27 +08:00
preprocess_img.py version without renderer 2019-03-29 12:39:27 +08:00
readme.md Update readme.md 2019-08-08 19:12:36 +08:00
reconstruct_mesh.py version without renderer 2019-03-29 12:39:27 +08:00

readme.md

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set

This is a python implementation of the following paper:

Y. Deng, J. Yang, S. Xu, D. Chen, Y. Jia, and X. Tong, Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set, IEEE Computer Vision and Pattern Recognition Workshop (CVPRW) on Analysis and Modeling of Faces and Gestures (AMFG), 2019. (Best Paper Award!)

The method enforces a hybrid-level weakly-supervised training to for CNN-based 3D face reconstruction. It is fast, accurate, and robust to pose and occlussions. It achieves state-of-the-art performance on multiple datasets such as FaceWarehouse, MICC Florence and BU-3DFE.

Features

● Accurate shapes

The method reconstructs faces with high accuracy. Quantitative evaluations (shape errors in mm) on several benchmarks show its state-of-the-art performance:

Method FaceWareHouse Florence BU3DFE
Tewari et al. 17 2.19±0.54 - -
Tewari et al. 18 1.84±0.38 - -
Genova et al. 18 - 1.77±0.53 -
Sela et al. 17 - - 2.91±0.60
PRN 18 - - 1.86±0.47
Ours 1.81±0.50 1.67±0.50 1.40±0.31

● High fidelity textures

The method produces high fidelity face textures meanwhile preserves identity information of input images. Scene illumination is also disentangled to guarantee a pure albedo.

● Robust

The method can provide reasonable results under extreme conditions such as large pose and occlusions.

● Aligned with images

Our method aligns reconstruction faces with input images. It provides face pose estimation and 68 facial landmarks which are useful for other tasks. We conduct an experiment on AFLW_2000 dataset (NME) to evaluate the performance, as is shown in the table below:

Method [0°,30°] [30°,60°] [60°,90°] Overall
3DDFA 16 3.78 4.54 7.93 5.42
3DDFA+SDM 16 3.43 4.24 7.17 4.94
Bulat et al. 17 2.47 3.01 4.31 3.26
PRN 18 2.75 3.51 4.61 3.62
Ours 2.56 3.11 4.45 3.37

● Easy and Fast

Faces are represented with Basel Face Model 2009, which is easy for further manipulations (e.g expression transfer). ResNet-50 is used as backbone network to achieve over 50 fps (on GTX 1080) for reconstructions.

Getting Started

Prerequisite

Optional:

  • tf mesh renderer (We use it as renderer while training. Can be used at test stage too. Only on Linux.)

Usage

  1. Clone the repository
git clone https://github.com/Microsoft/Deep3DFaceReconstruction
cd Deep3DFaceReconstruction
  1. Download the BFM09 model and put "01_MorphableModel.mat" into ./BFM subfolder.

  2. Download the Expression Basis provided by Guo (You can find a link named CoarseData in the first row of Introduction part in their repository. Download and unzip the Coarse_Dataset.zip), and put "Exp_Pca.bin" into ./BFM subfolder.

  3. Download the trained model at GoogleDrive, and put it into ./network subfolder.

  4. Run the demo code.

python demo.py
  1. To check the results, see ./output subfolder which contains:
    • "xxx.mat" : consists of cropped input image, corresponding 5p and 68p landmarks, and output coefficients of R-Net.
    • "xxx_mesh.obj" : 3D face mesh in canonical view (best viewed in MeshLab).

Tips

  1. The model is trained without augmentation so that a pre-alignment with 5 facial landmarks is necessary. We put some examples in the ./input subfolder for reference.

  2. Current model is trained under the assumption of 3-channel scene illumination (instead of white light described in the paper).

  3. We exclude ear and neck region of original BFM09. To see which vertices are preserved, check select_vertex_id.mat in the ./BFM subfolder. Note that index starts from 1.

  4. If you have any questions, please contact Yu Deng (v-denyu@microsoft.com) or Jiaolong Yang (jiaoyan@microsoft.com).

Citation

Please cite the following paper if this model helps your research:

@inproceedings{deng2019accurate,
    title={Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set},
    author={Yu Deng and Jiaolong Yang and Sicheng Xu and Dong Chen and Yunde Jia and Xin Tong},
    booktitle={IEEE Computer Vision and Pattern Recognition Workshops},
    year={2019}
}