Extended Global-Local Representation Learning for Video Person Re-Identification

被引:7
作者
Song, Wanru [1 ]
Wu, Yahong [1 ]
Zheng, Jieying [1 ]
Chen, Changhong [1 ,2 ]
Liu, Feng [1 ,2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Image Proc & Image Commun, Nanjing 210003, Jiangsu, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Minist Educ, Key Lab Broadband Wireless Commun & Sensor Networ, Nanjing 210003, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Bi-directional LSTM; feature extraction; global-local feature representation; person re-identification; video; APPEARANCE REPRESENTATION;
D O I
10.1109/ACCESS.2019.2937974
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for video-based person re-identification, namely, the extended global-local representation learning network (EGLRN). Given a video sequence of a pedestrian, the holistic and local features are simultaneously extracted using the E-GLRN network. Specifically, for the global feature learning, we adopt the channel attention convolutional neural network (CNN) and the bidirectional long short-term memory (Bi-LSTM) networks, which are responsible for introducing a CNN-LSTM module to learn the features of consecutive frames. The local feature learning module relies on the key local information extraction, which is based on the Bi-LSTM networks. In order to obtain the local feature more effectively, our work defines a concept of "the main image group'' by selecting three representative frames. The local feature representation of a video is obtained by exploiting the spatial contextual and appearance information of this group. The local and global features extracted in this paper are complementary and further combined into a discriminative and robust feature representation of the video sequence. Extensive experiments are conducted on three video-based ReID datasets, including the iLIDS-VID, PRID2011 and MARS datasets. The experimental results demonstrate that the proposed method outperforms state-of-the-art video-based re-identification approaches.
引用
收藏
页码:122684 / 122696
页数:13
相关论文
共 55 条
[1]  
[Anonymous], 2017, ARXIV170307220
[2]  
[Anonymous], ABS171110658 CORR
[3]  
[Anonymous], IEEE T CIRCUITS SYST
[4]   Spatial-Temporal Attention-Aware Learning for Video-Based Person Re-Identification [J].
Chen, Guangyi ;
Lu, Jiwen ;
Yang, Ming ;
Zhou, Jie .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (09) :4192-4205
[5]  
Chen W., 2017, P IEEE C COMPUTER VI, P403
[6]   Video Person Re-Identification by Temporal Residual Learning [J].
Dai, Ju ;
Zhang, Pingping ;
Wang, Dong ;
Lu, Huchuan ;
Wang, Hongyu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (03) :1366-1377
[7]  
Gong S, 2014, ADV COMPUT VIS PATT, P1, DOI 10.1007/978-1-4471-6296-4
[8]   Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features [J].
Gray, Douglas ;
Tao, Hai .
COMPUTER VISION - ECCV 2008, PT I, PROCEEDINGS, 2008, 5302 :262-275
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]  
Hirzer M, 2011, LECT NOTES COMPUT SC, V6688, P91, DOI 10.1007/978-3-642-21227-7_9