Deep Triply Attention Network for RGBT Tracking

被引:4
作者
Yang, Rui [1 ,2 ]
Wang, Xiao [1 ,2 ]
Zhu, Yabin [1 ,3 ]
Tang, Jin [1 ,2 ]
机构
[1] Anhui Univ, Key Lab Intelligent Comp & Signal Proc, Minist Educ, Hefei 230601, Anhui, Peoples R China
[2] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Anhui, Peoples R China
[3] Anhui Univ, Sch Elect & Informat Engn, Hefei 230601, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
RGBT tracking; Local attention; Co-attention mechanism; Global proposals; VISUAL TRACKING; MODEL;
D O I
10.1007/s12559-023-10158-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RGB-Thermal (RGBT) tracking has gained significant attention in the field of computer vision due to its wide range of applications in video surveillance, autonomous driving, and human-computer interaction. This paper focuses on achieving a robust fusion of different modalities for RGBT tracking through attention modeling. We propose an effective triply attentive network for robust RGBT tracking, which consists of a local attention module, a cross-modality co-attention module, and a global attention module. The local attention module enables the tracker to focus on target regions while considering background interference, generated through backpropagation of the score map with respect to the RGB and thermal image pair. To enhance the interaction of different modalities in feature learning, we introduce a co-attention module that selects more discriminative features for both the visible (RGB) and thermal modalities simultaneously. To compensate for the limitations of local sampling, we incorporate a global attention module based on multi-modal information to compute high-quality global proposals. This module not only complements the local search strategy but also re-tracks lost targets when they come back into view. Extensive experiments conducted on three RGBT tracking datasets demonstrate that our proposed method outperforms other RGBT trackers, achieving more competitive results. Specifically, on the LasHeR dataset, the precision rate, normalized precision rate, and success rate reach 57.5%, 51.6%, and 41.0%, respectively. The above state-of-the-art experimental results confirm the effectiveness of our method in exploring the complementary advantages between modalities and achieving robust visual tracking.
引用
收藏
页码:1934 / 1946
页数:13
相关论文
共 61 条
[1]  
[Anonymous], 2018, PREPRINT
[2]   Staple: Complementary Learners for Real-Time Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Golodetz, Stuart ;
Miksik, Ondrej ;
Torr, Philip H. S. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1401-1409
[3]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[4]   Learning spatio-temporal context via hierarchical features for visual tracking [J].
Cao, Yi ;
Ji, Hongbing ;
Zhang, Wenbo ;
Xue, Fei .
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2018, 66 :50-65
[5]   Attentional Correlation Filter Network for Adaptive Visual Tracking [J].
Choi, Jongwon ;
Chang, Hyung Jin ;
Yun, Sangdoo ;
Fischer, Tobias ;
Demiris, Yiannis ;
Choi, Jin Young .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4828-4837
[6]   Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism [J].
Chu, Qi ;
Ouyang, Wanli ;
Li, Hongsheng ;
Wang, Xiaogang ;
Liu, Bin ;
Yu, Nenghai .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4846-4855
[7]   Recurrently Target-Attending Tracking [J].
Cui, Zhen ;
Xiao, Shengtao ;
Feng, Jiashi ;
Yan, Shuicheng .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1449-1458
[8]   ECO: Efficient Convolution Operators for Tracking [J].
Danelljan, Martin ;
Bhat, Goutam ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6931-6939
[9]   Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking [J].
Danelljan, Martin ;
Robinson, Andreas ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :472-488
[10]   Learning Spatially Regularized Correlation Filters for Visual Tracking [J].
Danelljan, Martin ;
Hager, Gustav ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4310-4318