Unity's privacy-preserving human-centric synthetic data generator
Перейти к файлу
Salehe Erfanian Ebadi d91dcf8f4c
Create LICENSE.md
2021-12-10 13:49:49 -07:00
_includes update project page 2021-09-22 15:52:13 -06:00
images Delete unity-wide-whiteback.png 2021-12-08 19:58:12 -07:00
peoplesanspeople_binaries Delete LICENSE 2021-12-10 13:49:08 -07:00
peoplesanspeople_unity_env Update README.md 2021-12-08 20:05:02 -07:00
.gitignore adding all material 2021-12-08 18:27:15 -07:00
CITATION.cff Update CITATION.cff 2021-12-08 18:44:00 -07:00
CODE_OF_CONDUCT.md Create CODE_OF_CONDUCT.md 2021-12-10 13:13:33 -07:00
CONTRIBUTING.md Create CONTRIBUTING.md 2021-12-10 11:29:52 -07:00
LICENSE.md Create LICENSE.md 2021-12-10 13:49:49 -07:00
README.md Update README.md 2021-12-08 20:06:25 -07:00
_config.yml update project page 2021-09-22 15:52:13 -06:00

README.md

PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer Vision, arXiv

Paper     Source Code     macOS Binary     Linux Binary



Top row: PeopleSansPeople generated images.
Bottom row: corresponding COCO-style bounding box and keypoint labels.

Salehe Erfanian Ebadi, You-Cyuan Jhang, Alex Zook, Saurav Dhakad,
Adam Crespi, Pete Parisi, Steven Borkman, Jonathan Hogins, Sujoy Ganguly
Unity Technologies


People + Sans (Middle English for “without”) + People

A data generator for a few human-centric computer vision tasks without needing real-world human data.


license badge   unity 2020.3.20f1   perception 0.9.0-preview.2


Summary

  • We introduce PeopleSansPeople, a human-centric privacy-preserving synthetic data generator with highly parametrized domain randomization.
  • PeopleSansPeople contains simulation-ready 3D human assets, a parameterized lighting and camera system, and generates 2D and 3D bounding box, instance and semantic segmentation, and COCO pose labels.
  • We use naïve ranges for the domain randomization and generate a synthetic dataset with labels.
  • We provide some guarantees and analysis of human activities, poses, and context diversity on our synthetic data.
  • We found that pre-training a network using synthetic data and fine-tuning on target real-world data (COCO-person train) resulted in bbox AP of 57.44 and keypoint AP of 66.83 (COCO-person validation) outperforming models trained with the same real data alone (bbox AP of 56.73 and keypoint AP of 65.12).
Abstract (click to expand)

In recent years, person detection and human pose estimation have made great strides, helped by large-scale labeled datasets. However, these datasets had no guarantees or analysis of human activities, poses, or context diversity. Additionally, privacy concerns may limit the ability to collect more data. An emerging alternative to real-world data that alleviates some of these issues is synthetic data. However, creation of synthetic data generators is incredibly challenging and prevents researchers from exploring their usefulness. Therefore, we release a human-centric synthetic data generator PeopleSansPeople which contains simulation-ready 3D human assets, a parameterized lighting and camera system, and generates 2D and 3D bounding box, instance and semantic segmentation, and COCO pose labels. Using PeopleSansPeople, we performed benchmark synthetic data training using a Detectron2 Keypont R-CNN variant. We found that pre-training a network using synthetic data and fine-tuning on target real-world data (COCO-person train) resulted in bbox AP of 57.44 and keypoint AP of 66.83 (COCO-person validation) outperforming models trained with the same real data alone (bbox AP of 56.73 and keypoint AP of 65.12). This freely available data generator should enable a wide range of research into the emerging field of simulation to real transfer learning in the critical area of human-centric computer vision.

What does PeopleSansPeople provide?

  • 28 parameterized simulation-ready 3D human assets
  • 39 diverse animation clips
  • 21,952 unique clothing textures (from 28 albedos, 28 masks, and 28 normals)
  • rameterized lighting
  • Parameterized camera system
  • Natural backgrounds
  • Primitive occluders/distractors
  • All packaged in a macOS and Linux binary

A comparison between PeopleSansPeople and the COCO person dataset.

# train# validation# instances (train)# instances w/ kpts (train)
COCO64,1152,693262,465149,813
PeopleSansPeople490,00010,000>3,070,000>2,900,000

Generated data and labels

PeopleSansPeople produces the following types of labels in COCO format: 2D bounding box, human keypoints, semantic and instance segmentation masks. In addition PeopleSansPeople generates 3D bounding boxes which are provided in Unity's Perception format.

           
Generated image and corresponding labels: 2D bounding box, human keypoints, semantic and instance segmentation masks in COCO format. 3D bounding box annotations are provided separately in [Unity Perception](https://github.com/Unity-Technologies/com.unity.perception) format.

Results

Here we show a comparison of gains obtained from pre-training on our synthetic data and fune-tuning on COCO person class over training from scratch on COCO. For each dataset size we show the results of the best performing model.

bbox APkeypoint AP
size of real datascratchw/ pre-trainΔscratchw/ pre-trainΔ
64113.8242.58+28.767.4746.40+38.93
641137.8249.04+11.2239.4855.21+15.73
3205752.1555.04+2.8958.6863.38+4.70
6411556.7357.44+0.7165.1266.83+1.71

Unity Shader Graph randomizer for simulated clothing appearance diversity



Top row: our 3D human assets from RenderPeople with their original clothing textures.
Bottom row: using Shader Graph randomizers we are able to swap out clothing texture albedos, masks, and normals, yielding very diverse-looking textures on the clothing, without needing to swap out the clothing items themselves.

Additional examples







Additional images generated with PeopleSansPeople. Notice the high diversity of lighting, camera perspectives, scene background and occluders, as well as human poses, their proximity to each other and the camera, and the clothing texture variations. Our domain randomization is done here with naïvely-chosen ranges with uniform distributions. It is possible to drastically change the look and the structure of the scenes by varying the randomizer parameters.

Citation

@article{ebadi2021peoplesanspeople,
    title={PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer Vision},
    author={Salehe Erfanian Ebadi and You-Cyuan Jhang and Alex Zook and Saurav Dhakad 
            and Adam Crespi and Pete Parisi and Steven Borkman and Jonathan Hogins and Sujoy Ganguly},
    journal={arXiv},
    year={2021}
}

Source code

Code is available here

Unity Environment Template here

macOS and Linux binaries here

Unity Tutorial coming soon

License

PeopleSansPeople is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.