DigiFace-1M: 1 Million Digital Face Images for Face Recognition

Gwangbin Bae; Martin de La Gorce; Tadas Baltrusaitis; Charlie Hewitt; Dong Chen; Julien Valentin; Roberto Cipolla; Jingjing Shen

DigiFace-1M: 1 Million Digital Face Images for Face Recognition

Gwangbin Bae ,
Martin de La Gorce ,
Tadas Baltrusaitis ,
Charlie Hewitt ,
Dong Chen ,
Julien Valentin ,
Roberto Cipolla ,
Jingjing Shen

Winter Conference on Applications of Computer Vision | January 2023

Organized by IEEE/CVF

Download BibTex

State-of-the-art face recognition models show impressive accuracy, achieving over 99.8% on Labeled Faces in the Wild (LFW) dataset. Such models are trained on large-scale datasets that contain millions of real human face images collected from the internet. Web-crawled face images are severely biased (in terms of race, lighting, make-up, etc) and often contain label noise. More importantly, the face images are collected without explicit consent, raising ethical concerns. To avoid such problems, we introduce a large-scale synthetic dataset for face recognition, obtained by rendering digital faces using a computer graphics pipeline. We first demonstrate that aggressive data augmentation can significantly reduce the synthetic-to-real domain gap. Having full control over the rendering pipeline, we also study how each attribute (eg, variation in facial pose, accessories and textures) affects the accuracy. Compared to SynFace, a recent method trained on GAN-generated synthetic faces, we reduce the error rate on LFW by 52.5%(accuracy from 91.93% to 96.17%). By fine-tuning the network on a smaller number of real face images that could reasonably be obtained with consent, we achieve accuracy that is comparable to the methods trained on millions of real face images.