Deep Learning Face Attributes in the Wild

被引:4197
作者
Liu, Ziwei [1 ,3 ]
Luo, Ping [1 ,3 ]
Wang, Xiaogang [2 ,3 ]
Tang, Xiaoou [1 ,3 ]
机构
[1] Chinese Univ Hong Kong, Dept Informat Engn, Hong Kong, Hong Kong, Peoples R China
[2] Chinese Univ Hong Kong, Dept Elect Engn, Hong Kong, Hong Kong, Peoples R China
[3] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen Key Lab Comp Vis & Pat Rec, Beijing, Peoples R China
来源
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2015年
关键词
CASCADE;
D O I
10.1109/ICCV.2015.425
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting face attributes in the wild is challenging due to complex face variations. We propose a novel deep learning framework for attribute prediction in the wild. It cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently. LNet is pre-trained by massive general object categories for face localization, while ANet is pre-trained by massive face identities for attribute prediction. This framework not only outperforms the state-of-the-art with a large margin, but also reveals valuable facts on learning face representation. (1) It shows how the performances of face localization (LNet) and attribute prediction (ANet) can be improved by different pre-training strategies. (2) It reveals that although the filters of LNet are fine-tuned only with image-level attribute tags, their response maps over entire images have strong indication of face locations. This fact enables training LNet for face localization with only image-level annotations, but without face bounding boxes or landmarks, which are required by all attribute recognition works. (3) It also demonstrates that the high-level hidden neurons of ANet automatically discover semantic concepts after pre-training with massive face identities, and such concepts are significantly enriched after fine-tuning with attribute tags. Each attribute can be well explained with a sparse linear combination of these concepts.
引用
收藏
页码:3730 / 3738
页数:9
相关论文
共 34 条
[1]  
[Anonymous], ARXIV14093964
[2]  
[Anonymous], 2014, P NEURIPS, DOI DOI 10.5555/2968826.2968851
[3]  
[Anonymous], 2011, 2011 International Joint Conference on Biometrics
[4]  
[Anonymous], 2013, Decaf: A deep convolutional activation feature for generic visual recognition
[5]  
[Anonymous], 1989, P C ADV NEUR INF PRO
[6]   POOF: Part-Based One-vs-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation [J].
Berg, Thomas ;
Belhumeur, Peter N. .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :955-962
[7]  
Bourdev L, 2011, IEEE I CONF COMP VIS, P1543, DOI 10.1109/ICCV.2011.6126413
[8]  
Chung J., 2012, NIPS WORKSH DEEP LEA, V3
[9]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10]  
etal, 2014, P ADV NEUR INF PROC, P1988, DOI DOI 10.1007/978-3-030-01252-6_48