Caption-Supervised Face Recognition: Training a State-of-the-Art Face Model Without Manual Annotation

被引：12

作者：

Huang, Qingqiu ^{[1
]}

Yang, Lei ^{[1
]}

Huang, Huaiyi ^{[1
]}

Wu, Tong ^{[2
]}

Lin, Dahua ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] Tsinghua Univ, Beijing, Peoples R China

来源：

COMPUTER VISION - ECCV 2020, PT XVII | 2020年 / 12362卷

关键词：

REPRESENTATION; CLASSIFICATION;

D O I：

10.1007/978-3-030-58520-4_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The advances over the past several years have pushed the performance of face recognition to an amazing level. This great success, to a large extent, is built on top of millions of annotated samples. However, as we endeavor to take the performance to the next level, the reliance on annotated data becomes a major obstacle. We desire to explore an alternative approach, namely using captioned images for training, as an attempt to mitigate this difficulty. Captioned images are widely available on the web, while the captions often contain the names of the subjects in the images. Hence, an effective method to leverage such data would significantly reduce the need of human annotations. However, an important challenge along this way needs to be tackled: the names in the captions are often noisy and ambiguous, especially when there are multiple names in the captions or multiple people in the photos. In this work, we propose a simple yet effective method, which trains a face recognition model by progressively expanding the labeled set via both selective propagation and caption-driven expansion. We build a large-scale dataset of captioned images, which contain 6.3M faces from 305K subjects. Our experiments show that using the proposed method, we can train a state-of-the-art face recognition model without manual annotation (99.65% in LFW). This shows the great potential of caption-supervised face recognition.

引用

页码：139 / 155

页数：17

共 53 条

[1] Multiple instance classification: Review, taxonomy and comparative study [J].

Amores, Jaume .

ARTIFICIAL INTELLIGENCE, 2013, 201 :81-105

[2]

[Anonymous], 2017, Deep learning is robust to massive label noise

[3] Latent Dirichlet allocation [J].

Blei, DM ;

Ng, AY ;

Jordan, MI .

JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022

[4] Web Objects Ambient: An Integrated Platform Supporting New Kinds of Personal Web Experiences [J].

Bosetti, Gabriela ;

Firmenich, Sergio ;

Rossi, Gustavo ;

Winckler, Marco ;

Barbieri, Tomas .

WEB ENGINEERING (ICWE 2016), 2016, 9671 :563-566

[5]

Brodley C.E., 2011, IDENTIFYING MISLABEL

[6]

Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698

[7] VGGFace2: A dataset for recognising faces across pose and age [J].

Cao, Qiong ;

Shen, Li ;

Xie, Weidi ;

Parkhi, Omkar M. ;

Zisserman, Andrew .

PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, :67-74

[8]

Chen BH, 2016, IEEE INT CON MULTI

[9] ArcFace: Additive Angular Margin Loss for Deep Face Recognition [J].

Deng, Jiankang ;

Guo, Jia ;

Xue, Niannan ;

Zafeiriou, Stefanos .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4685-4694

[10] Solving the multiple instance problem with axis-parallel rectangles [J].

Dietterich, TG ;

Lathrop, RH ;

LozanoPerez, T .

ARTIFICIAL INTELLIGENCE, 1997, 89 (1-2) :31-71

← 1 2 3 4 5 6 →