Multi-granularity Feature Fusion for Transformer-Based Single Object Tracking

被引:0
作者
Wang, Ziye [1 ]
Miao, Duoqian [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, 4800 Caoan Highway, Shanghai 201804, Peoples R China
来源
ROUGH SETS, IJCRS 2023 | 2023年 / 14481卷
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Computer vision; Single object tracking; Multi granularity; Rough set; Transformer; VISUAL TRACKING;
D O I
10.1007/978-3-031-50959-9_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recently developed transformer has been largely explored in the research field of computer vision and especially improve the performance of single object tracking. However, the majority of current efforts concentrate on combining and enhancing convolutional neural network (CNN)-generated features and cannot fully excavating the potential of transformer. Motivated by this, we introduce multi-granularity theory into the pure transformer-based single object tracker and design a multi-granularity feature fusion module. With a view to fuse the feature of different granularity and enhance the feature representation, we design the double-branch transformer feature extractor and utilize cross-attention mechanism to fuse the feature. In our extensive experiments on multiple tracking benchmarks, including OTB2015, VOT2020, TrackingNet, GOT-10k, LaSOT, our proposed method named MGTT, the results could demonstrate that the proposed tracker achieves better performance than multiple state-of-the-art trackers.
引用
收藏
页码:311 / 323
页数:13
相关论文
共 75 条
  • [1] [Anonymous], 2007, Granular computing: past, present and future prospects
  • [2] Attention Augmented Convolutional Networks
    Bello, Irwan
    Zoph, Barret
    Vaswani, Ashish
    Shlens, Jonathon
    Le, Quoc V.
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3285 - 3294
  • [3] Fully-Convolutional Siamese Networks for Object Tracking
    Bertinetto, Luca
    Valmadre, Jack
    Henriques, Joao F.
    Vedaldi, Andrea
    Torr, Philip H. S.
    [J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865
  • [4] Learning Discriminative Model Prediction for Tracking
    Bhat, Goutam
    Danelljan, Martin
    Van Gool, Luc
    Timofte, Radu
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6181 - 6190
  • [5] Unveiling the Power of Deep Tracking
    Bhat, Goutam
    Johnander, Joakim
    Danelljan, Martin
    Khan, Fahad Shahbaz
    Felsberg, Michael
    [J]. COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 493 - 509
  • [6] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
    Chen, Chun-Fu
    Fan, Quanfu
    Panda, Rameswar
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 347 - 356
  • [7] Transformer Tracking
    Chen, Xin
    Yan, Bin
    Zhu, Jiawen
    Wang, Dong
    Yang, Xiaoyun
    Lu, Huchuan
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8122 - 8131
  • [8] Chen Y., 2023, Appl Intell, P1
  • [9] ATOM: Accurate Tracking by Overlap Maximization
    Danelljan, Martin
    Bhat, Goutam
    Khan, Fahad Shahbaz
    Felsberg, Michael
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4655 - 4664
  • [10] ECO: Efficient Convolution Operators for Tracking
    Danelljan, Martin
    Bhat, Goutam
    Khan, Fahad Shahbaz
    Felsberg, Michael
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6931 - 6939