Fusion Tree Network for RGBT Tracking

被引:7
作者
Cheng, Zhiyuan [1 ]
Lu, Andong [1 ]
Zhang, Zhang [4 ,5 ]
Li, Chenglong [2 ,3 ]
Wang, Liang [4 ,5 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Hefei, Peoples R China
[2] Anhui Prov Key Lab Multimodal Cognit Computat, Hefei, Peoples R China
[3] Anhui Univ, Sch Artificial Intelligence, Hefei, Peoples R China
[4] Ctr Res Intelligent Percept & Comp, NLPR, CASIA, Beijing, Peoples R China
[5] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
2022 18TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS 2022) | 2022年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/AVSS56176.2022.9959406
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RGBT tracking is often affected by complex scenes ( i.e., occlusions, scale changes, noisy background, etc). Existing works usually adopt a single-strategy RGBT tracking fusion scheme to handle modalityfitsion in all scenarios. However, due to the limitation of fusion model capacity, it is difficult to fully integrate the discriminative features between different modalities. 'lb tackle this problem, we propose a Fusion Tree Network (FTNet), which provides a multistrategy fusion model with high capacity to efficiently fuse different modalities. Specifically, we combine three kinds of attention modules ( i.e., channel attention, spatial attention, and location attention) in a tree structure to achieve multi-path hybrid attention in the deeper convolutional stages of the object tracking network Extensive experiments are performed on three RGBT tracking datasets, and the results show that our method achieves superior performance among state-of-the-art RGBT tracking models.
引用
收藏
页数:8
相关论文
共 32 条
[1]  
Chatfield K, 2014, Arxiv, DOI [arXiv:1405.3531, DOI 10.48550/ARXIV.1405.3531]
[2]   Challenge-Aware RGBT Tracking [J].
Li, Chenglong ;
Liu, Lei ;
Lu, Andong ;
Ji, Qing ;
Tang, Jin .
COMPUTER VISION - ECCV 2020, PT XXII, 2020, 12367 :222-237
[3]  
Frosst N, 2017, Arxiv, DOI arXiv:1711.09784
[4]  
Gao Yuan, 2019, P IEEECVF INT C COMP
[5]   Strip Pooling: Rethinking Spatial Pooling for Scene Parsing [J].
Hou, Qibin ;
Zhang, Li ;
Cheng, Ming-Ming ;
Feng, Jiashi .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4002-4011
[6]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
[7]  
Jetley S, 2018, Arxiv, DOI arXiv:1804.02391
[8]   Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization [J].
Ji, Ruyi ;
Wen, Longyin ;
Zhang, Libo ;
Du, Dawei ;
Wu, Yanjun ;
Zhao, Chen ;
Liu, Xianglong ;
Huang, Feiyue .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10465-10474
[9]   Real-Time MDNet [J].
Jung, Ilchae ;
Son, Jeany ;
Baek, Mooyeol ;
Han, Bohyung .
COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 :89-104
[10]   SOWP: Spatially Ordered and Weighted Patch Descriptor for Visual Tracking [J].
Kim, Han-Ul ;
Lee, Dae-Youn ;
Sim, Jae-Young ;
Kim, Chang-Su .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :3011-3019