Improving Night-Time Pedestrian Retrieval With Distribution Alignment and Contextual Distance

被引:33
作者
Ye, Mang [1 ]
Cheng, Yi [2 ]
Lan, Xiangyuan [1 ]
Zhu, Hongyuan [2 ]
机构
[1] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Peoples R China
[2] ASTAR, I2R, Singapore 138632, Singapore
关键词
Cameras; Training; Informatics; Task analysis; Face recognition; Neural networks; Visualization; Cross modality; contextual distance; distribution alignment; pedestrian retrieval; COUPLED DICTIONARY; FACE; REPRESENTATION;
D O I
10.1109/TII.2019.2946030
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Night-time pedestrian retrieval is a cross-modality retrieval task of retrieving person images between day-time visible images and night-time thermal images. It is a very challenging problem due to modality difference, camera variations, and person variations, but it plays an important role in night-time video surveillance. The existing cross-modality retrieval usually focuses on learning modality sharable feature representations to bridge the modality gap. In this article, we propose to utilize auxiliary information to improve the retrieval performance, which consistently improves the performance with different baseline loss functions. Our auxiliary information contains two major parts: cross-modality feature distribution and contextual information. The former aligns the cross-modality feature distributions between two modalities to improve the performance, and the latter optimizes the cross-modality distance measurement with the contextual information. We also demonstrate that abundant annotated visible pedestrian images, which are easily accessible, help to improve the cross-modality pedestrian retrieval as well. The proposed method is featured in two aspects: the auxiliary information does not need additional human intervention or annotation; it learns discriminative feature representations in an end-to-end deep learning manner. Extensive experiments on two cross-modality pedestrian retrieval datasets demonstrate the superiority of the proposed method, achieving much better performance than the state-of-the-arts.
引用
收藏
页码:615 / 624
页数:10
相关论文
共 47 条
[1]   Efficient Conversion of Deep Features to Compact Binary Codes Using Fourier Decomposition for Multimedia Big Data [J].
Ahmad, Jamil ;
Muhammad, Khan ;
Lloret, Jaime ;
Baik, Sung Wook .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2018, 14 (07) :3205-3215
[2]  
Andrienko G., 2013, Introduction, P1
[3]  
[Anonymous], IEEE T CYBERN
[4]   Regularized Diffusion Process on Bidirectional Context for Object Retrieval [J].
Bai, Song ;
Bai, Xiang ;
Tian, Qi ;
Latecki, Longin Jan .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (05) :1213-1226
[5]   Sparse Contextual Activation for Efficient Visual Re-Ranking [J].
Bai, Song ;
Bai, Xiang .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (03) :1056-1069
[6]  
Barbosa IB, 2012, LECT NOTES COMPUT SC, V7583, P433, DOI 10.1007/978-3-642-33863-2_43
[7]   Learning Aligned Cross-Modal Representations from Weakly Aligned Data [J].
Castrejon, Lluis ;
Aytar, Yusuf ;
Vondrick, Carl ;
Pirsiavash, Hamed ;
Torralba, Antonio .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2940-2949
[8]  
Dai PY, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P677
[9]   Quadruplet Network With One-Shot Learning for Fast Visual Object Tracking [J].
Dong, Xingping ;
Shen, Jianbing ;
Wu, Dongming ;
Guo, Kan ;
Jin, Xiaogang ;
Porikli, Fatih .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (07) :3516-3527
[10]  
Dziugaite GK, 2015, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, P258