Learning modality feature fusion via transformer for RGBT-tracking

被引：11

作者：

Cai, Yujue ^{[1
]}

Sui, Xiubao ^{[1
]}

Gu, Guohua ^{[1
]}

Chen, Qian ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Elect & Opt Engn, Nanjing 210014, Peoples R China

来源：

INFRARED PHYSICS & TECHNOLOGY | 2023年 / 133卷

基金：

中国国家自然科学基金;

关键词：

RGB-T tracking; Deep learning; Transformer; Challenge-aware; Feature fusion; NETWORK;

D O I：

10.1016/j.infrared.2023.104819

中图分类号：

TH7 [仪器、仪表];

学科分类号：

0804 ; 080401 ; 081102 ;

摘要：

RGB-T tracking can be seen as multi-view fusion tracking, and in this study, we propose a network with transformer structure, Multi-Modal Mutual Propagation Tracker (MMMPT). In order to obtain robust appearance model from multi-modal data, we adopt encoder-decoder architecture for extract information. In the encoding stage, the template features of multiple frames enhance the common features across them through the self-attention mechanism to obtain time-invariant target representation. At the same time, it also interacts with multi-modal data through cross-modal propagation, resulting in a modal-invariant representation of the target. The transformer decoder transfers useful information from the template to search areas through a similarity matrix. We experiment on the RGBT234, GTOT, VTUAV and LasHeR datasets to assess the RGBT-transformer tracker. Extensive experiments indicate that our proposed framework is not inferior to the state-of-the-art trackers in terms of robustness and accuracy.

引用

页数：10

共 44 条

[1] Fully-Convolutional Siamese Networks for Object Tracking [J].

Bertinetto, Luca ;

Valmadre, Jack ;

Henriques, Joao F. ;

Vedaldi, Andrea ;

Torr, Philip H. S. .

COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865

[2] Learning Discriminative Model Prediction for Tracking [J].

Bhat, Goutam ;

Danelljan, Martin ;

Van Gool, Luc ;

Timofte, Radu .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6181-6190

[3]

Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13

[4] Transformer Tracking [J].

Chen, Xin ;

Yan, Bin ;

Zhu, Jiawen ;

Wang, Dong ;

Yang, Xiaoyun ;

Lu, Huchuan .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8122-8131

[5]

Chenglong Li, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12367), P222, DOI 10.1007/978-3-030-58542-6_14

[6] ECO: Efficient Convolution Operators for Tracking [J].

Danelljan, Martin ;

Bhat, Goutam ;

Khan, Fahad Shahbaz ;

Felsberg, Michael .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6931-6939

[7]

Gao Y., 2019, P IEEECVF INT C COMP

[8] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[9]

Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]

[10]

Huang KL, 2022, Arxiv, DOI [arXiv:2202.02703, 10.48550/arXiv.2202.02703, DOI 10.48550/ARXIV.2202.02703]

← 1 2 3 4 5 →