Online Multi-Scale Classification and Global Feature Modulation for Robust Visual Tracking

被引:3
|
作者
Gao, Qi [1 ]
Yin, Mingfeng [2 ]
Wu, Xiang [3 ]
Liu, Di [4 ]
Bo, Yuming [3 ]
机构
[1] Jiangsu Univ Technol, Coll Mech Engn, Changzhou 213001, Peoples R China
[2] Jiangsu Univ Technol, Sch Automobile & Traff Engn, Changzhou 213001, Peoples R China
[3] Nanjing Univ Sci & Technol, Sch Automat, Nanjing 210094, Peoples R China
[4] Nanjing Inst Technol, Sch Automat, Nanjing 211167, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Target tracking; Accuracy; Fuses; Modulation; Transformers; Real-time systems; Visual object tracking; coordinate attention; online multi-scale classification; global feature modulation; OBJECT TRACKING;
D O I
10.1109/TCSVT.2023.3343949
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recent advanced trackers, composed of discriminative classification and dedicated bounding box estimation, have achieved remarkable advancements in performance of visual object tracking. However, existing methods cannot satisfy the demands of tracking tasks in complex scenes, such as occlusion, scale variations, and etc. To this end, we propose a novel online multi-scale classification and global feature modulation for robust visual tracking, which is developed over accurate tracking by overlap maximization, named ATOM+. First, coordinate attention (CA) is applied to enhance the target features in the channel dimension and spatial dimension, which can effectively optimize the feature representation ability of the backbone network. Second, an online multi-scale classification (OMC) module is designed. During the online tracking phase, more reliable matching responses are comprehensively generated by aggregating information from different scales related to the target. This new operation enables stable perception of the target by the tracker, particularly when severe changes in the appearance and posture of the target are encountered. Third, a global feature modulation (GFM) mechanism is constructed, which requires only a small amount of computational resources, to fuse the spatial contextual information of the template image into the search region. This integration refines the bounding box to obtain an accurate estimate of the target state. Finally, comprehensive experiments on conventional tracking benchmarks of OTB100, LaSOT, and VOT2018 show that our tracker can sufficiently address different challenging scenarios, and achieves state-of-the-art performance. For the average running speed, our tracker can achieve 37 FPS in real time.
引用
收藏
页码:5321 / 5334
页数:14
相关论文
共 50 条
  • [21] SiamMaskAttn: inverted residual attention block fusing multi-scale feature information for multitask visual object tracking networks
    Xiaofeng Bian
    Chenggang Guo
    Signal, Image and Video Processing, 2024, 18 : 1305 - 1316
  • [22] SiamMFF: UAV Object Tracking Algorithm Based on Multi-Scale Feature Fusion
    Hou, Yanli
    Gai, Xilin
    Wang, Xintao
    Zhang, Yongqiang
    IEEE ACCESS, 2024, 12 : 24725 - 24734
  • [23] Multi-domain collaborative feature representation for robust visual object tracking
    Zhang, Jiqing
    Zhao, Kai
    Dong, Bo
    Fu, Yingkai
    Wang, Yuxin
    Yang, Xin
    Yin, Baocai
    VISUAL COMPUTER, 2021, 37 (9-11) : 2671 - 2683
  • [24] Multi-domain collaborative feature representation for robust visual object tracking
    Jiqing Zhang
    Kai Zhao
    Bo Dong
    Yingkai Fu
    Yuxin Wang
    Xin Yang
    Baocai Yin
    The Visual Computer, 2021, 37 : 2671 - 2683
  • [25] Robust multi-scale ship tracking via multiple compressed features fusion
    Teng, Fei
    Liu, Qing
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2015, 31 : 76 - 85
  • [26] Online unsupervised feature learning for visual tracking
    Liu, Fayao
    Shen, Chunhua
    Reid, Ian
    van den Hengel, Anton
    IMAGE AND VISION COMPUTING, 2016, 51 : 84 - 94
  • [27] MSCPNet: A Multi-Scale Convolutional Pooling Network for Maize Disease Classification
    Al-Gaashani, Mehdhar S. A. M.
    Alkanhel, Reem
    Ali, Muthana Ali Salem
    Muthanna, Mohammed Saleh Ali
    Aziz, Ahmed
    Muthanna, Ammar
    IEEE ACCESS, 2025, 13 : 11423 - 11446
  • [28] Robust visual tracking via multi-feature response maps fusion using a collaborative local-global layer visual model
    Zhang, Haoyang
    Liu, Guixi
    Hao, Zhaohui
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 56 : 1 - 14
  • [29] Multi-scale mean shift tracking
    Yu, Wangsheng
    Tian, Xiaohua
    Hou, Zhiqiang
    Zha, Yufei
    Yang, Yuan
    IET COMPUTER VISION, 2015, 9 (01) : 110 - 123
  • [30] An Explainable Deep Learning Framework for Sorghum Weed Classification Using Multi-Scale Feature Enhanced DenseNet
    Ajay, Armaano
    Sandosh, S.
    Saji, Aryan
    Agarwal, Harsh
    IEEE ACCESS, 2025, 13 : 26973 - 26990