TrackingMamba: Visual State Space Model for Object Tracking

被引:2
|
作者
Wang, Qingwang [1 ,2 ]
Zhou, Liyao [1 ,2 ]
Jin, Pengcheng [1 ,2 ]
Xin, Qu [1 ,2 ]
Zhong, Hangwei [1 ,2 ]
Song, Haochen [1 ,2 ]
Shen, Tao [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Comp Technol Applicat, Kunming 650500, Peoples R China
基金
中国国家自然科学基金;
关键词
Object tracking; Autonomous aerial vehicles; Transformers; Feature extraction; Computational modeling; Accuracy; Visualization; Jungle scenes; Mamba; object tracking; UAV remote sensing;
D O I
10.1109/JSTARS.2024.3458938
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, UAV object tracking has provided technical support across various fields. Most existing work relies on convolutional neural networks (CNNs) or visual transformers. However, CNNs have limited receptive fields, resulting in suboptimal performance, while transformers require substantial computational resources, making training and inference challenging. Mountainous and jungle environments-critical components of the Earth's surface and key scenarios for UAV object tracking-present unique challenges due to steep terrain, dense vegetation, and rapidly changing weather conditions, which complicate UAV tracking. The lack of relevant datasets further reduces tracking accuracy. This article introduces a new tracking framework based on a state-space model called TrackingMamba, which uses a single-stream tracking architecture with Vision Mamba as its backbone. TrackingMamba not only matches transformer-based trackers in global feature extraction and long-range dependence modeling but also maintains computational efficiency with linear growth. Compared to other advanced trackers, TrackingMamba delivers higher accuracy with a simpler model framework, fewer parameters, and reduced FLOPs. Specifically, on the UAV123 benchmark, TrackingMamba outperforms the baseline model OSTtrack-256, improving AUC by 2.59% and Precision by 4.42%, while reducing parameters by 95.52% and FLOPs by 95.02%. The article also evaluates the performance and shortcomings of TrackingMamba and other advanced trackers in the complex and critical context of jungle environments, and it explores potential future research directions in UAV jungle object tracking.
引用
收藏
页码:16744 / 16754
页数:11
相关论文
共 50 条
  • [41] Exploiting structural constraints for visual object tracking
    Bouachir, Wassim
    Bilodeau, Guillaume-Alexandre
    IMAGE AND VISION COMPUTING, 2015, 43 : 39 - 49
  • [42] Parallel Dual Networks for Visual Object Tracking
    Tian Li
    Peihan Wu
    Feifei Ding
    Wenyuan Yang
    Applied Intelligence, 2020, 50 : 4631 - 4646
  • [43] VTST: Efficient Visual Tracking With a Stereoscopic Transformer
    Gu, Fengwei
    Lu, Jun
    Cai, Chengtao
    Zhu, Qidan
    Ju, Zhaojie
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (03): : 2401 - 2416
  • [44] Learning structured visual dictionary for object tracking
    Yang, Fan
    Lu, Huchuan
    Yang, Ming-Hsuan
    IMAGE AND VISION COMPUTING, 2013, 31 (12) : 992 - 999
  • [45] Visual Object Tracking by Structure Complexity Coefficients
    Yuan, Yuan
    Yang, Huan
    Fang, Yuming
    Lin, Weisi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (08) : 1125 - 1136
  • [46] Transferring Visual Prior for Online Object Tracking
    Wang, Qing
    Chen, Feng
    Yang, Jimei
    Xu, Wenli
    Yang, Ming-Hsuan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2012, 21 (07) : 3296 - 3305
  • [47] Visual Object Tracking Using Particle Filter
    Hossain, Kabir
    Lee, Chi-Woo
    2012 9TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAL), 2012, : 98 - 102
  • [48] PERCEPTION ENHANCED FRAME FOR VISUAL OBJECT TRACKING
    Song, Binpeng
    Liu, Jianfeng
    Ye, Jian
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 658 - 662
  • [49] Parallel Dual Networks for Visual Object Tracking
    Li, Tian
    Wu, Peihan
    Ding, Feifei
    Yang, Wenyuan
    APPLIED INTELLIGENCE, 2020, 50 (12) : 4631 - 4646
  • [50] Chaotic particle filter for visual object tracking
    Firouznia, Marjan
    Faez, Karim
    Amindavar, Hamidreza
    Koupaei, Javad Alikhani
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 53 : 1 - 12