Learning discriminative features with a dual-constrained guided network for video-based person re-identification

被引:5
作者
Chen, Cuiqun [1 ]
Qi, Meibin [1 ,2 ]
Huang, Guanghong [1 ,3 ]
Wu, Jingjing [1 ]
Jiang, Jianguo [1 ]
Li, Xiaohong [1 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Anhui, Peoples R China
[2] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ, Hefei, Peoples R China
[3] Anhui Siliepoch Technol Co Ltd, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Video-based person re-identification; Discriminative features; Dual-constrained guided network; Frame-level constraint; Sequence-level constraint; ATTENTION;
D O I
10.1007/s11042-021-11072-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video-based person re-identification (ReID) aims at matching pedestrians in a large video gallery across different cameras. However, some interference factors in most real-world scenarios, such as occlusion, pose variations and new appearances, make ReID a challenging task. Most existing methods learn the features of each frame independently without using the complementary information between different frames, which leads to the fact that the extracted frame features do not have enough discriminability to solve the above problems. In this paper, we propose a novel dual-constrained guided network (DCGN) to capture discriminative features by modeling the relations across frames with two steps. First, to learn the frame-level discriminative features, we design a frame-constrained module (FCM) that learns the channel attention weights by means of combining the intra-frame information and inter-frame information. Next, we propose a sequence-constrained module (SCM) to determine the importance of each frame in a video. This module models the relations between the frame-level features and sequence-level features, alleviating the frame redundancy from a global perspective. We conduct comparison experiments on four representative datasets, i.e., MARS, DukeMTMC-VideoReID, iLIDS-VID and PRID2011. In particular, the Rank-1 reaches 89.65%, 95.35%, 78.51% and 90.82% on four datasets, which outperforms the second-best method by 2.35%, 1.35%, 3.41% and 2.72%, respectively.
引用
收藏
页码:28673 / 28696
页数:24
相关论文
共 70 条
[1]   A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing [J].
Ali, Ahmad ;
Zhu, Yanmin ;
Zakarya, Muhammad .
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (20) :31401-31433
[2]   Leveraging Spatio-Temporal Patterns for Predicting Citywide Traffic Crowd Flows Using Deep Hybrid Neural Networks [J].
Ali, Ahmad ;
Zhu, Yanmin ;
Chen, Qiuxia ;
Yu, Jiadi ;
Cai, Haibin .
2019 IEEE 25TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2019, :125-132
[3]  
[Anonymous], ARXIV160601609
[4]  
Chao HQ, 2019, AAAI CONF ARTIF INTE, P8126
[5]   Video Person Re-identification with Competitive Snippet-similarity Aggregation and Co-attentive Snippet Embedding [J].
Chen, Dapeng ;
Li, Hongsheng ;
Xiao, Tong ;
Yi, Shuai ;
Wang, Xiaogang .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :CP1-CP99
[6]   Self-Critical Attention Learning for Person Re-Identification [J].
Chen, Guangyi ;
Lin, Chunze ;
Ren, Liangliang ;
Lu, Jiwen ;
Zhou, Jie .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9636-9645
[7]  
Chen ZQ, 2020, AAAI CONF ARTIF INTE, V34, P10591
[8]   Scale-fusion framework for improving video-based person re-identification performance [J].
Cheng, Li ;
Jing, Xiao-Yuan ;
Zhu, Xiaoke ;
Ma, Fei ;
Hu, Chang-Hui ;
Cai, Ziyun ;
Qi, Fumin .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (16) :12841-12858
[9]   Local and global aligned spatiotemporal attention network for video-based person re-identification [J].
Cheng, Li ;
Jing, Xiao-Yuan ;
Zhu, Xiaoke ;
Hu, Chang-Hui ;
Gao, Guangwei ;
Wu, Songsong .
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) :34489-34512
[10]  
Felzenszwalb P., 2008, PROC CVPR IEEE, P1