Saliency and Granularity: Discovering Temporal Coherence for Video-Based Person Re-Identification

被引:24
作者
Chen, Cuiqun [1 ]
Ye, Mang [2 ]
Qi, Meibin [1 ,3 ]
Wu, Jingjing [1 ]
Liu, Yimin [1 ]
Jiang, Jianguo [1 ,3 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230601, Anhui, Peoples R China
[2] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci,Inst Artificial Intelligence, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan 430072, Peoples R China
[3] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ, Hefei 230601, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Convolution; Three-dimensional displays; Video sequences; Noise measurement; Data mining; Coherence; Person re-identification; temporal invariant features; relation learning; temporal spatial-relation module; temporal channel-relation module; ATTENTION;
D O I
10.1109/TCSVT.2022.3157130
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video-based person re-identification (ReID) matches the same people across the video sequences with rich spatial and temporal information in complex scenes. It is highly challenging to capture discriminative information when occlusions and pose variations exist between frames. A key solution to this problem rests on extracting the temporal invariant features of video sequences. In this paper, we propose a novel method for discovering temporal coherence by designing a region-level saliency and granularity mining network (SGMN). Firstly, to address the varying noisy frame problem, we design a temporal spatial-relation module (TSRM) to locate frame-level salient regions, adaptively modeling the temporal relations on spatial dimension through a probe-buffer mechanism. It avoids the information redundancy between frames and captures the informative cues of each frame. Secondly, a temporal channel-relation module (TCRM) is proposed to further mine the small granularity information of each frame, which is complementary to TSRM by concentrating on discriminative small-scale regions. TCRM exploits a one-and-rest difference relation on channel dimension to enhance the granularity features, leading to stronger robustness against misalignments. Finally, we evaluate our SGMN with four representative video-based datasets, including iLIDS-VID, MARS, DukeMTMC-VideoReID, and LS-VID, and the results indicate the effectiveness of the proposed method.
引用
收藏
页码:6100 / 6112
页数:13
相关论文
共 65 条
[1]  
[Anonymous], 2020, MindSpore
[2]   Object Level Visual Reasoning in Videos [J].
Baradel, Fabien ;
Neverova, Natalia ;
Wolf, Christian ;
Mille, Julien ;
Mori, Greg .
COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 :106-122
[3]   The recognition of human movement using temporal templates [J].
Bobick, AF ;
Davis, JW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) :257-267
[4]  
Chen CW., IEEE transactions on image processing
[5]   Video Person Re-identification with Competitive Snippet-similarity Aggregation and Co-attentive Snippet Embedding [J].
Chen, Dapeng ;
Li, Hongsheng ;
Xiao, Tong ;
Yi, Shuai ;
Wang, Xiaogang .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :CP1-CP99
[6]  
Chen GY, 2020, Img Proc Comp Vis Re, V12353, P660, DOI 10.1007/978-3-030-58598-3_39
[7]   Learning Recurrent 3D Attention for Video-Based Person Re-Identification [J].
Chen, Guangyi ;
Lu, Jiwen ;
Yang, Ming ;
Zhou, Jie .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :6963-6976
[8]   Spatial-Temporal Attention-Aware Learning for Video-Based Person Re-Identification [J].
Chen, Guangyi ;
Lu, Jiwen ;
Yang, Ming ;
Zhou, Jie .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (09) :4192-4205
[9]  
Chen GY, 2017, IEEE IMAGE PROC, P111, DOI 10.1109/ICIP.2017.8296253
[10]  
Chen ZY, 2020, Arxiv, DOI arXiv:2006.07597