SANet: Statistic Attention Network for Video-Based Person Re-Identification

被引:22
作者
Bai, Shutao [1 ,2 ]
Ma, Bingpeng [2 ]
Chang, Hong [1 ,2 ]
Huang, Rui [3 ,4 ]
Shan, Shiguang [1 ,2 ]
Chen, Xilin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100049, Peoples R China
[3] Chinese Univ Hong Kong, Sch Sci & Engn, Shenzhen 518172, Guangdong, Peoples R China
[4] Shenzhen Inst Artificial Intelligence & Robot, Shenzhen 518172, Guangdong, Peoples R China
关键词
Feature extraction; Task analysis; Computational modeling; Visualization; Video sequences; Fuses; Computer science; Person re-identification; self-attention; long-range dependencies; high-order statistics;
D O I
10.1109/TCSVT.2021.3119983
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Capturing long-range dependencies during feature extraction is crucial for video-based person re-identification (re-id) since it would help to tackle many challenging problems such as occlusion and dramatic pose variation. Moreover, capturing subtle differences, such as bags and glasses, is indispensable to distinguish similar pedestrians. In this paper, we propose a novel and efficacious Statistic Attention (SA) block which can capture both the long-range dependencies and subtle differences. SA block leverages high-order statistics of feature maps, which contain both long-range and high-order information. By modeling relations with these statistics, SA block can explicitly capture long-range dependencies with less time complexity. In addition, high-order statistics usually concentrate on details of feature maps and can perceive the subtle differences between pedestrians. In this way, SA block is capable of discriminating pedestrians with subtle differences. Furthermore, this lightweight block can be conveniently inserted into existing deep neural networks at any depth to form Statistic Attention Network (SANet). To evaluate its performance, we conduct extensive experiments on two challenging video re-id datasets, showing that our SANet outperforms the state-of-the-art methods. Furthermore, to show the generalizability of SANet, we evaluate it on three image re-id datasets and two more general image classification datasets, including ImageNet. The source code is available at http://vipl.ict.ac.cn/resources/codes/code/SANet_code.zip.
引用
收藏
页码:3866 / 3879
页数:14
相关论文
共 81 条
[1]  
Bai S, 2017, AAAI CONF ARTIF INTE, P1281
[2]   Mixed High-Order Attention Network for Person Re-Identification [J].
Chen, Binghui ;
Deng, Weihong ;
Hu, Jiani .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :371-381
[3]   Video Person Re-identification with Competitive Snippet-similarity Aggregation and Co-attentive Snippet Embedding [J].
Chen, Dapeng ;
Li, Hongsheng ;
Xiao, Tong ;
Yi, Shuai ;
Wang, Xiaogang .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :CP1-CP99
[4]   Spatial-Temporal Attention-Aware Learning for Video-Based Person Re-Identification [J].
Chen, Guangyi ;
Lu, Jiwen ;
Yang, Ming ;
Zhou, Jie .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (09) :4192-4205
[5]   SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
Nie, Liqiang ;
Shao, Jian ;
Liu, Wei ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306
[6]   Salience-Guided Cascaded Suppression Network for Person Re-identification [J].
Chen, Xuesong ;
Fu, Canmiao ;
Zhao, Yong ;
Zheng, Feng ;
Song, Jingkuan ;
Ji, Rongrong ;
Yang, Yi .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3297-3307
[7]   Graph-Based Global Reasoning Networks [J].
Chen, Yunpeng ;
Rohrbach, Marcus ;
Yan, Zhicheng ;
Yan, Shuicheng ;
Feng, Jiashi ;
Kalantidis, Yannis .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :433-442
[8]  
Chen YP, 2018, ADV NEUR IN, V31
[9]   Video Person Re-Identification by Temporal Residual Learning [J].
Dai, Ju ;
Zhang, Pingping ;
Wang, Dong ;
Lu, Huchuan ;
Wang, Hongyu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (03) :1366-1377
[10]  
Defferrard M, 2016, ADV NEUR IN, V29