Extended Global-Local Representation Learning for Video Person Re-Identification

被引：7

作者：

Song, Wanru ^{[1
]}

Wu, Yahong ^{[1
]}

Zheng, Jieying ^{[1
]}

Chen, Changhong ^{[1
,2
]}

Liu, Feng ^{[1
,2
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Image Proc & Image Commun, Nanjing 210003, Jiangsu, Peoples R China

[2] Nanjing Univ Posts & Telecommun, Minist Educ, Key Lab Broadband Wireless Commun & Sensor Networ, Nanjing 210003, Jiangsu, Peoples R China

来源：

IEEE ACCESS | 2019年 / 7卷

基金：

中国国家自然科学基金;

关键词：

Bi-directional LSTM; feature extraction; global-local feature representation; person re-identification; video; APPEARANCE REPRESENTATION;

D O I：

10.1109/ACCESS.2019.2937974

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for video-based person re-identification, namely, the extended global-local representation learning network (EGLRN). Given a video sequence of a pedestrian, the holistic and local features are simultaneously extracted using the E-GLRN network. Specifically, for the global feature learning, we adopt the channel attention convolutional neural network (CNN) and the bidirectional long short-term memory (Bi-LSTM) networks, which are responsible for introducing a CNN-LSTM module to learn the features of consecutive frames. The local feature learning module relies on the key local information extraction, which is based on the Bi-LSTM networks. In order to obtain the local feature more effectively, our work defines a concept of "the main image group'' by selecting three representative frames. The local feature representation of a video is obtained by exploiting the spatial contextual and appearance information of this group. The local and global features extracted in this paper are complementary and further combined into a discriminative and robust feature representation of the video sequence. Extensive experiments are conducted on three video-based ReID datasets, including the iLIDS-VID, PRID2011 and MARS datasets. The experimental results demonstrate that the proposed method outperforms state-of-the-art video-based re-identification approaches.

引用

页码：122684 / 122696

页数：13

共 55 条

[1]

[Anonymous], 2017, ARXIV170307220

[2]

[Anonymous], ABS171110658 CORR

[3]

[Anonymous], IEEE T CIRCUITS SYST

[4] Spatial-Temporal Attention-Aware Learning for Video-Based Person Re-Identification [J].

Chen, Guangyi ;

Lu, Jiwen ;

Yang, Ming ;

Zhou, Jie .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (09) :4192-4205

[5]

Chen W., 2017, P IEEE C COMPUTER VI, P403

[6] Video Person Re-Identification by Temporal Residual Learning [J].

Dai, Ju ;

Zhang, Pingping ;

Wang, Dong ;

Lu, Huchuan ;

Wang, Hongyu .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (03) :1366-1377

[7]

Gong S, 2014, ADV COMPUT VIS PATT, P1, DOI 10.1007/978-1-4471-6296-4

[8] Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features [J].

Gray, Douglas ;

Tao, Hai .

COMPUTER VISION - ECCV 2008, PT I, PROCEEDINGS, 2008, 5302 :262-275

[9] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[10]

Hirzer M, 2011, LECT NOTES COMPUT SC, V6688, P91, DOI 10.1007/978-3-642-21227-7_9

← 1 2 3 4 5 6 →