Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature

被引:5
作者
Sun, Rui [1 ]
Huang, Qiheng [1 ]
Xia, Miaomiao [1 ]
Zhang, Jun [1 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Feicui Rd 420, Hefei 230000, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
person re-identification; end-to-end architecture; appearance-temporal features; Siamese network; pivotal frames;
D O I
10.3390/s18113669
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Video-based person re-identification is an important task with the challenges of lighting variation, low-resolution images, background clutter, occlusion, and human appearance similarity in the multi-camera visual sensor networks. In this paper, we propose a video-based person re-identification method called the end-to-end learning architecture with hybrid deep appearance-temporal feature. It can learn the appearance features of pivotal frames, the temporal features, and the independent distance metric of different features. This architecture consists of two-stream deep feature structure and two Siamese networks. For the first-stream structure, we propose the Two-branch Appearance Feature (TAF) sub-structure to obtain the appearance information of persons, and used one of the two Siamese networks to learn the similarity of appearance features of a pairwise person. To utilize the temporal information, we designed the second-stream structure that consisting of the Optical flow Temporal Feature (OTF) sub-structure and another Siamese network, to learn the person's temporal features and the distances of pairwise features. In addition, we select the pivotal frames of video as inputs to the Inception-V3 network on the Two-branch Appearance Feature sub-structure, and employ the salience-learning fusion layer to fuse the learned global and local appearance features. Extensive experimental results on the PRID2011, iLIDS-VID, and Motion Analysis and Re-identification Set (MARS) datasets showed that the respective proposed architectures reached 79%, 59% and 72% at Rank-1 and had advantages over state-of-the-art algorithms. Meanwhile, it also improved the feature representation ability of persons.
引用
收藏
页数:21
相关论文
共 46 条
[1]  
Bolle R., 2005, IEEE WORKSH AUT ID A, DOI DOI 10.1109/AUTOID.2005.48
[2]  
Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339
[3]   Person Re-identification by Exploiting Spatio-Temporal Cues and Multi-view Metric Learning [J].
Chen, Jiaxin ;
Wang, Yunhong ;
Tang, Yuan Yan .
IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (07) :998-1002
[4]  
Chen L. C., 2017, P IEEE C COMP VIS PA
[5]   Beyond triplet loss: a deep quadruplet network for person re-identification [J].
Chen, Weihua ;
Chen, Xiaotang ;
Zhang, Jianguo ;
Huang, Kaiqi .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1320-1329
[6]   Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function [J].
Cheng, De ;
Gong, Yihong ;
Zhou, Sanping ;
Wang, Jinjun ;
Zheng, Nanning .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1335-1344
[7]   A Two Stream Siamese Convolutional Neural Network For Person Re-Identification [J].
Chung, Dahjung ;
Tahboub, Khalid ;
Delp, Edward J. .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1992-2000
[8]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[9]  
Dehghan A, 2015, PROC CVPR IEEE, P4091, DOI 10.1109/CVPR.2015.7299036
[10]  
etal, 2014, P ADV NEUR INF PROC, P1988, DOI DOI 10.1007/978-3-030-01252-6_48