A Light CNN for Deep Face Representation With Noisy Labels

被引：765

作者：

Wu, Xiang ^{[1
,2
,3
,4
]}

He, Ran ^{[1
,2
,3
,4
]}

Sun, Zhenan ^{[1
,2
,3
,4
]}

Tan, Tieniu ^{[1
,2
,3
,4
]}

机构：

[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100864, Peoples R China

[2] Chinese Acad Sci, Ctr Res Intelligent Percept & Comp, Inst Automat, Beijing 100864, Peoples R China

[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing 100864, Peoples R China

[4] Univ Chinese Acad Sci, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY | 2018年 / 13卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Convolutional neural network; face recognition; CLASSIFICATION; RECOGNITION;

D O I：

10.1109/TIFS.2018.2833032

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The volume of convolutional neural network (CNN) models proposed for face recognition has been continuously growing larger to better fit the large amount of training data. When training data are obtained from the Internet, the labels are likely to be ambiguous and inaccurate. This paper presents a Light CNN framework to learn a compact embedding on the large-scale face data with massive noisy labels. First, we introduce a variation of maxout activation, called max-feature-map (MFM), into each convolutional layer of CNN. Different from maxout activation that uses many feature maps to linearly approximate an arbitrary convex activation function, MFM does so via a competitive relationship. MFM can not only separate noisy and informative signals but also play the role of feature selection between two feature maps. Second, three networks are carefully designed to obtain better performance, meanwhile, reducing the number of parameters and computational costs. Finally, a semantic bootstrapping method is proposed to make the prediction of the networks more consistent with noisy labels. Experimental results show that the proposed framework can utilize large-scale noisy data to learn a Light model that is efficient in computational costs and storage spaces. The learned single network with a 256-D representation achieves state-of-theart results on various face benchmarks without fine-tuning.

引用

页码：2884 / 2896

页数：13

共 79 条

[1] DYNAMICS OF PATTERN FORMATION IN LATERAL-INHIBITION TYPE NEURAL FIELDS [J].

AMARI, SI .

BIOLOGICAL CYBERNETICS, 1977, 27 (02) :77-87

[2]

[Anonymous], 2015, CORR ABS150602640

[3]

[Anonymous], 2014, P NEURIPS, DOI DOI 10.5555/2968826.2968851

[4]

[Anonymous], PROC CVPR IEEE

[5]

[Anonymous], 2016, ABS161001206 CORR

[6]

[Anonymous], P BRIT MACH VIS C

[7]

[Anonymous], 2016, P 4 INT C LEARN REPR

[8]

[Anonymous], 2008, PROC WORKSHOP FACES

[9]

[Anonymous], 2012, PROC 29 INT C MACH L

[10]

[Anonymous], 2015, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2015.123

← 1 2 3 4 5 6 7 8 →