Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition

被引:220
作者
He, Ran [1 ]
Wu, Xiang [1 ]
Sun, Zhenan [1 ]
Tan, Tieniu [1 ]
机构
[1] Univ Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Ctr Res Intelligent Percept & Comp, CAS,CASIA,Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Heterogeneous face recognition; VIS-NIR face matching; feature representation; SPECTRAL REGRESSION; COUPLED DICTIONARY;
D O I
10.1109/TPAMI.2018.2842770
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Heterogeneous face recognition (HFR) aims at matching facial images acquired from different sensing modalities with mission-critical applications in forensics, security and commercial sectors. However, HFR presents more challenging issues than traditional face recognition because of the large intra-class variation among heterogeneous face images and the limited availability of training samples of cross-modality face image pairs. This paper proposes the novel Wasserstein convolutional neural network (WCNN) approach for learning invariant features between near-infrared (NIR) and visual (VIS) face images (i.e., NIR-VIS face recognition). The low-level layers of the WCNN are trained with widely available face images in the VIS spectrum, and the high-level layer is divided into three parts: the NIR layer, the VIS layer and the NIR-VIS shared layer. The first two layers aim at learning modality-specific features, and the NIR-VIS shared layer is designed to learn a modality-invariant feature subspace. The Wasserstein distance is introduced into the NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions. W-CNN leaming is performed to minimize the Wasserstein distance between the NIR distribution and the VIS distribution for invariant deep feature representations of heterogeneous face images. To avoid the over-fitting problem on small-scale heterogeneous face data, a correlation prior is introduced on the fully-connected WCNN layers to reduce the size of the parameter space. This prior is implemented by a low-rank constraint in an end-to-end network. The joint formulation leads to an alternating minimization for deep feature representation at the training stage and an efficient computation for heterogeneous data at the testing stage. Extensive experiments using three challenging NIR-VIS face recognition databases demonstrate the superiority of the WCNN method over state-of-the-art methods.
引用
收藏
页码:1761 / 1773
页数:13
相关论文
共 59 条
[1]  
[Anonymous], P AISTATS
[2]  
[Anonymous], 2016, ABS160708221 CORR
[3]  
[Anonymous], 2016, P IEEE C COMP VIS PA
[4]  
[Anonymous], 2016, ACM T INTEL SYST TEC, DOI DOI 10.1145/2807705
[5]  
[Anonymous], PROC CVPR IEEE
[6]  
[Anonymous], 2017, ABS170310717 CORR
[7]  
[Anonymous], P IAPR IEEE INT C BI
[8]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[9]   Multidimensional Scaling for Matching Low-Resolution Face Images [J].
Biswas, Soma ;
Bowyer, Kevin W. ;
Flynn, Patrick J. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (10) :2019-2030
[10]  
Chen D, 2012, LECT NOTES COMPUT SC, V7574, P566, DOI 10.1007/978-3-642-33712-3_41