Spatio-temporal mix deformable feature extractor in visual tracking

被引:2
|
作者
Huang, Yucheng [1 ]
Xiao, Ziwang [1 ]
Firkat, Eksan [1 ]
Zhang, Jinlai [4 ]
Wu, Danfeng [2 ,3 ]
Hamdulla, Askar [1 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Beijing Union Univ, Beijing Key Lab Informat Serv Engn, Beijing, Peoples R China
[3] Beijing Union Univ, Coll Robot, Beijing, Peoples R China
[4] Changsha Univ Sci & Technol, Coll Automot & Mech Engn, Changsha 410114, Peoples R China
关键词
Object tracking; Self-attention; Convolution; Feature fusion;
D O I
10.1016/j.eswa.2023.121377
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The emergence of ACMix fundamentally integrates convolution and self-attention mechanisms, fully leveraging their advantages. However, it faces challenges in associating temporal sequences and struggles to achieve accurate feature sampling. Additionally, its global correlation ability makes it susceptible to interference from irrelevant information. To address these issues, we propose the Spatio-Temporal Deformable Mix Feature Extractor (STD-ME) based on ACMix. In STD-ME, we designed deformable modules for both convolution and attention branches, incorporating spatio-temporal context to enable more precise feature sampling. By integrating STD-ME into a tracker that employs multi-frame fusion, we aim to further enhance its performance. The utilization of Crop-Transform-Paste for manual data synthesis offers a novel perspective for self-supervised tracking. However, it is important to note that while this method has shown impressive results, the synthesized data lacks spatio-temporal continuity in attributes such as scale variation, rotation, illumination variation, position, and partial occlusion, which limits its alignment with real-world scenarios. Consequently, training trackers based on multi-frame fusion may face challenges in achieving significant breakthroughs. To overcome this limitation, we introduce Spatial-Temporal Transformation (STT). STT utilizes an Iterative Random Number Generator (IRNG) based on a normal distribution to probabilistically generate spatio-temporal continuous data. Finally, we conducted extensive experiments on STD-ME and STT to demonstrate the effectiveness of our proposed methods.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Spatio-temporal Active Learning for Visual Tracking
    Liu, Chenfeng
    Zhu, Pengfei
    Hu, Qinghua
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [2] Learning Spatio-Temporal Transformer for Visual Tracking
    Yan, Bin
    Peng, Houwen
    Fu, Jianlong
    Wang, Dong
    Lu, Huchuan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10428 - 10437
  • [3] Joint spatio-temporal modeling for visual tracking
    Sun, Yumei
    Tang, Chuanming
    Luo, Hui
    Li, Qingqing
    Peng, Xiaoming
    Zhang, Jianlin
    Li, Meihui
    Wei, Yuxing
    KNOWLEDGE-BASED SYSTEMS, 2024, 283
  • [4] Spatio-temporal matching for siamese visual tracking
    Zhang, Jinpu
    Dai, Kaiheng
    Li, Ziwen
    Wei, Ruonan
    Wang, Yuehuan
    NEUROCOMPUTING, 2023, 522 : 73 - 88
  • [5] Asymmetric Deformable Spatio-temporal Framework forInfrared Object Tracking
    Wu, Jingjing
    Zhou, Xi
    Li, Xiaohong
    Liu, Hao
    Qi, Meibin
    Hong, Richang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (10)
  • [6] GRADUALLY SPATIO-TEMPORAL FEATURE ACTIVATION FOR TARGET TRACKING
    Deng, Yanfang
    Zhang, Canlong
    Li, Zhixin
    Wei, Chunrong
    Wang, Zhiwen
    Pan, Shuqi
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3925 - 3929
  • [7] Online visual tracking by integrating spatio-temporal cues
    He, Yang
    Pei, Mingtao
    Yang, Min
    Wu, Yuwei
    Jia, Yunde
    IET COMPUTER VISION, 2015, 9 (01) : 124 - 137
  • [8] Learning spatio-temporal correlation filter for visual tracking
    Yan, Youmin
    Guo, Xixian
    Tang, Jin
    Li, Chenglong
    Wang, Xin
    NEUROCOMPUTING, 2021, 436 : 273 - 282
  • [9] Deep learning of spatio-temporal information for visual tracking
    Gwangmin Choe
    Ilmyong Son
    Chunhwa Choe
    Hyoson So
    Hyokchol Kim
    Gyongnam Choe
    Multimedia Tools and Applications, 2022, 81 : 17283 - 17302
  • [10] HUMAN TRACKING & VISUAL SPATIO-TEMPORAL STATISTICAL ANALYSIS
    Ioannidis, D.
    Krinidis, S.
    Tzovaras, D.
    Likothanassis, S.
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 3417 - 3419