Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition

被引:288
作者
Ding, Changxing [1 ]
Tao, Dacheng [2 ,3 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, 381 Wushan Rd, Guangzhou 510000, Guangdong, Peoples R China
[2] Univ Sydney, UBTech Sydney Artificial Intelligence Inst, Darlington, NSW 2008, Australia
[3] Univ Sydney, Fac Engn & Informat Technol, Sch Informat Technol, Darlington, NSW 2008, Australia
基金
澳大利亚研究理事会;
关键词
Video-based face recognition; video surveillance; blur- and pose-robust representations; convolutional neural networks;
D O I
10.1109/TPAMI.2017.2700390
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blur-insensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low- and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces. With the proposed techniques, we also obtain the first place in the BTAS 2016 Video Person Recognition Evaluation.
引用
收藏
页码:1002 / 1014
页数:13
相关论文
共 66 条
[1]  
Ahonen T, 2008, ICPR2009, P1, DOI DOI 10.1109/ICPR.2008.4761847
[2]  
[Anonymous], 2013, P IEEE 6 INT C BIOM
[3]  
[Anonymous], 2015, ARXIV PREPRINT ARXIV
[4]  
[Anonymous], P IEEE 8 INT C BIOM
[5]  
[Anonymous], ARXIV150601342
[6]  
[Anonymous], ARXIV13036361
[7]  
[Anonymous], PROC CVPR IEEE
[8]  
[Anonymous], 2015, ARXIV150607310
[9]  
[Anonymous], 2015, Journal of Computational and Graphical Statistics
[10]  
[Anonymous], P EUR C COMP VIS PAT