Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition

被引：288

作者：

Ding, Changxing ^{[1
]}

Tao, Dacheng ^{[2
,3
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, 381 Wushan Rd, Guangzhou 510000, Guangdong, Peoples R China

[2] Univ Sydney, UBTech Sydney Artificial Intelligence Inst, Darlington, NSW 2008, Australia

[3] Univ Sydney, Fac Engn & Informat Technol, Sch Informat Technol, Darlington, NSW 2008, Australia

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2018年 / 40卷 / 04期

基金：

澳大利亚研究理事会;

关键词：

Video-based face recognition; video surveillance; blur- and pose-robust representations; convolutional neural networks;

D O I：

10.1109/TPAMI.2017.2700390

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blur-insensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low- and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces. With the proposed techniques, we also obtain the first place in the BTAS 2016 Video Person Recognition Evaluation.

引用

页码：1002 / 1014

页数：13

共 66 条

[1]

Ahonen T, 2008, ICPR2009, P1, DOI DOI 10.1109/ICPR.2008.4761847

[2]

[Anonymous], 2013, P IEEE 6 INT C BIOM

[3]

[Anonymous], 2015, ARXIV PREPRINT ARXIV

[4]

[Anonymous], P IEEE 8 INT C BIOM

[5]

[Anonymous], ARXIV150601342

[6]

[Anonymous], ARXIV13036361

[7]

[Anonymous], PROC CVPR IEEE

[8]

[Anonymous], 2015, ARXIV150607310

[9]

[Anonymous], 2015, Journal of Computational and Graphical Statistics

[10]

[Anonymous], P EUR C COMP VIS PAT

← 1 2 3 4 5 6 7 →