Highly compact adaptive network based on transformer for RGBT tracking

被引:0
作者
Chen, Siqing [1 ]
Gao, Pan [2 ]
Wang, Xun [5 ]
Liao, Kuo [1 ]
Zhang, Ping [2 ,3 ,4 ]
机构
[1] UESTC, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
[2] UESTC, Sch Optoelect Sci & Engn, Chengdu 611731, Peoples R China
[3] UESTC, Shenzhen Inst Adv Study, Chengdu 611731, Peoples R China
[4] UESTC, Yibin Inst, Chengdu 611731, Peoples R China
[5] Southwest Inst Tech Phys, Chengdu 611731, Peoples R China
基金
中国国家自然科学基金;
关键词
Vision Transformer; RGBT tracking; Multi-modal fusion; FUSION;
D O I
10.1016/j.infrared.2024.105310
中图分类号
TH7 [仪器、仪表];
学科分类号
0804 ; 080401 ; 081102 ;
摘要
RGBT tracking is a challenging task that requires robust fusion of visible images (RGB) and thermal infrared images (TIR) to handle various scenarios, such as illumination changes, occlusions, and camouflage. The current popular RGBT trackers mostly based on two stream Siamese trackers. They tend to separately extracts template and search images region features, which neglects relationship between target and background. Moreover, they do not fuse RGB and TIR features properly, limiting their ability to utilize complementary features. To address these issues, we introduce a highly compact adaptive transformer -based network that unifies the process of feature extracting and correlation between template and search images. In this way, the compact network has dual -branches. They can combine feature extraction and correlation for different modalities of template images and search images. Meanwhile, we introduce a cross -modal weight redistribution module (CMWR) for multi -modal fusion. This adaptive fusion scheme learns discriminative features of RGB and TIR data and assigns weights to them, enabling them to complement each other. Furthermore, to address the issue of tracking targets of different scales, we design scale -adaptive optimization pyramid module (SAOP) that adapt to objects of different sizes. Our method achieves exceptional performance on the GTOT, RGBT234 and LasHeR datasets, surpassing most of the existing methods. The results are consistent across multiple datasets, demonstrating the effectiveness and superiority of our approach. And our code is released at: https://github.com/ELOESZHANG/HCANet.
引用
收藏
页数:14
相关论文
共 53 条
  • [1] Fully-Convolutional Siamese Networks for Object Tracking
    Bertinetto, Luca
    Valmadre, Jack
    Henriques, Joao F.
    Vedaldi, Andrea
    Torr, Philip H. S.
    [J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865
  • [2] Learning Discriminative Model Prediction for Tracking
    Bhat, Goutam
    Danelljan, Martin
    Van Gool, Luc
    Timofte, Radu
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6181 - 6190
  • [3] Learning modality feature fusion via transformer for RGBT-tracking
    Cai, Yujue
    Sui, Xiubao
    Gu, Guohua
    Chen, Qian
    [J]. INFRARED PHYSICS & TECHNOLOGY, 2023, 133
  • [4] Chen R., 2022, GFSNet: Generalization -friendly siamese network for thermal infrared object tracking, V123, DOI [10.1016/j.infrared.2022.104190, DOI 10.1016/J.INFRARED.2022.104190]
  • [5] Transformer Tracking
    Chen, Xin
    Yan, Bin
    Zhu, Jiawen
    Wang, Dong
    Yang, Xiaoyun
    Lu, Huchuan
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8122 - 8131
  • [6] Chenglong Li, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12367), P222, DOI 10.1007/978-3-030-58542-6_14
  • [7] ECO: Efficient Convolution Operators for Tracking
    Danelljan, Martin
    Bhat, Goutam
    Khan, Fahad Shahbaz
    Felsberg, Michael
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6931 - 6939
  • [8] Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking
    Danelljan, Martin
    Robinson, Andreas
    Khan, Fahad Shahbaz
    Felsberg, Michael
    [J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 472 - 488
  • [9] Learning Spatially Regularized Correlation Filters for Visual Tracking
    Danelljan, Martin
    Hager, Gustav
    Khan, Fahad Shahbaz
    Felsberg, Michael
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4310 - 4318
  • [10] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929