Highly compact adaptive network based on transformer for RGBT tracking

被引：0

作者：

Chen, Siqing ^{[1
]}

Gao, Pan ^{[2
]}

Wang, Xun ^{[5
]}

Liao, Kuo ^{[1
]}

Zhang, Ping ^{[2
,3
,4
]}

机构：

[1] UESTC, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China

[2] UESTC, Sch Optoelect Sci & Engn, Chengdu 611731, Peoples R China

[3] UESTC, Shenzhen Inst Adv Study, Chengdu 611731, Peoples R China

[4] UESTC, Yibin Inst, Chengdu 611731, Peoples R China

[5] Southwest Inst Tech Phys, Chengdu 611731, Peoples R China

来源：

INFRARED PHYSICS & TECHNOLOGY | 2024年 / 139卷

基金：

中国国家自然科学基金;

关键词：

Vision Transformer; RGBT tracking; Multi-modal fusion; FUSION;

D O I：

10.1016/j.infrared.2024.105310

中图分类号：

TH7 [仪器、仪表];

学科分类号：

0804 ; 080401 ; 081102 ;

摘要：

RGBT tracking is a challenging task that requires robust fusion of visible images (RGB) and thermal infrared images (TIR) to handle various scenarios, such as illumination changes, occlusions, and camouflage. The current popular RGBT trackers mostly based on two stream Siamese trackers. They tend to separately extracts template and search images region features, which neglects relationship between target and background. Moreover, they do not fuse RGB and TIR features properly, limiting their ability to utilize complementary features. To address these issues, we introduce a highly compact adaptive transformer -based network that unifies the process of feature extracting and correlation between template and search images. In this way, the compact network has dual -branches. They can combine feature extraction and correlation for different modalities of template images and search images. Meanwhile, we introduce a cross -modal weight redistribution module (CMWR) for multi -modal fusion. This adaptive fusion scheme learns discriminative features of RGB and TIR data and assigns weights to them, enabling them to complement each other. Furthermore, to address the issue of tracking targets of different scales, we design scale -adaptive optimization pyramid module (SAOP) that adapt to objects of different sizes. Our method achieves exceptional performance on the GTOT, RGBT234 and LasHeR datasets, surpassing most of the existing methods. The results are consistent across multiple datasets, demonstrating the effectiveness and superiority of our approach. And our code is released at: https://github.com/ELOESZHANG/HCANet.

引用

页数：14

共 53 条

[1] Fully-Convolutional Siamese Networks for Object Tracking
Bertinetto, Luca
Valmadre, Jack
Henriques, Joao F.
Vedaldi, Andrea
Torr, Philip H. S.
[J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865
[2] Learning Discriminative Model Prediction for Tracking
Bhat, Goutam
Danelljan, Martin
Van Gool, Luc
Timofte, Radu
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6181 - 6190
[3] Learning modality feature fusion via transformer for RGBT-tracking
Cai, Yujue
Sui, Xiubao
Gu, Guohua
Chen, Qian
[J]. INFRARED PHYSICS & TECHNOLOGY, 2023, 133
[4] Chen R., 2022, GFSNet: Generalization -friendly siamese network for thermal infrared object tracking, V123, DOI [10.1016/j.infrared.2022.104190, DOI 10.1016/J.INFRARED.2022.104190]
[5] Transformer Tracking
Chen, Xin
Yan, Bin
Zhu, Jiawen
Wang, Dong
Yang, Xiaoyun
Lu, Huchuan
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8122 - 8131
[6] Chenglong Li, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12367), P222, DOI 10.1007/978-3-030-58542-6_14
[7] ECO: Efficient Convolution Operators for Tracking
Danelljan, Martin
Bhat, Goutam
Khan, Fahad Shahbaz
Felsberg, Michael
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6931 - 6939
[8] Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking
Danelljan, Martin
Robinson, Andreas
Khan, Fahad Shahbaz
Felsberg, Michael
[J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 472 - 488
[9] Learning Spatially Regularized Correlation Filters for Visual Tracking
Danelljan, Martin
Hager, Gustav
Khan, Fahad Shahbaz
Felsberg, Michael
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4310 - 4318
[10] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

← 1 2 3 4 5 6 →