Incorporating texture and silhouette for video-based person re-identification

被引:7
作者
Bai, Shutao [1 ,2 ]
Chang, Hong [1 ,2 ]
Ma, Bingpeng [2 ]
机构
[1] Chinese Acad Sci, CAS, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
关键词
Silhouette; Relational modeling; Decomposition;
D O I
10.1016/j.patcog.2024.110759
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Silhouette is an effective modality in video-based person re-identification (ReID) since it contains features (e.g., e . g ., stature and gait) complementary to the RGB modality. However, recent silhouette-assisted methods have not fully explored the spatial-temporal relations within each modality or considered the cross-modal complementarity in fusion. To address these two issues, we propose a Complete Relational Framework that includes two key components. The first component, Spatial-Temporal Relational Module (STRM), explores the spatiotemporal relations. STRM decomposes the video's spatiotemporal context into local/fine-grained and global/semantic aspects, modeling them sequentially to enhance the representation of each modality. The second component, Modality-Channel Relational Module (MCRM), explores the complementarity between RGB and silhouette videos. MCRM aligns two modalities semantically and multiplies them to capture complementary interrelations. With these two modules focusing on intra- and cross-modal relationships, our method achieves superior results across multiple benchmarks with minimal additional parameters and FLOPs. Code and models are available at https://github.com/baist/crf.
引用
收藏
页数:13
相关论文
共 55 条
[1]  
[Anonymous], 2016, PROC C EMPIRICAL MET
[2]   Salient-to-Broad Transition for Video Person Re-identification [J].
Bai, Shutao ;
Ma, Bingpeng ;
Chang, Hong ;
Huang, Rui ;
Chen, Xilin .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :7329-7338
[3]   SANet: Statistic Attention Network for Video-Based Person Re-Identification [J].
Bai, Shutao ;
Ma, Bingpeng ;
Chang, Hong ;
Huang, Rui ;
Shan, Shiguang ;
Chen, Xilin .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) :3866-3879
[4]  
Chao HQ, 2019, AAAI CONF ARTIF INTE, P8126
[5]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6]   CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification [J].
Fu, Chaoyou ;
Hu, Yibo ;
Wu, Xiang ;
Shi, Hailin ;
Mei, Tao ;
He, Ran .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :11803-11812
[7]   Look into Person: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing [J].
Gong, Ke ;
Liang, Xiaodan ;
Zhang, Dongyu ;
Shen, Xiaohui ;
Lin, Liang .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6757-6765
[8]   Appearance-Preserving 3D Convolution for Video-Based Person Re-identification [J].
Gu, Xinqian ;
Chang, Hong ;
Ma, Bingpeng ;
Zhang, Hongkai ;
Chen, Xilin .
COMPUTER VISION - ECCV 2020, PT II, 2020, 12347 :228-243
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]   Dense Interaction Learning for Video-based Person Re-identification [J].
He, Tianyu ;
Jin, Xin ;
Shen, Xu ;
Huang, Jianqiang ;
Chen, Zhibo ;
Hua, Xian-Sheng .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :1470-1481