TrackingMamba: Visual State Space Model for Object Tracking

被引：2

作者：

Wang, Qingwang ^{[1
,2
]}

Zhou, Liyao ^{[1
,2
]}

Jin, Pengcheng ^{[1
,2
]}

Xin, Qu ^{[1
,2
]}

Zhong, Hangwei ^{[1
,2
]}

Song, Haochen ^{[1
,2
]}

Shen, Tao ^{[1
,2
]}

机构：

[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China

[2] Kunming Univ Sci & Technol, Yunnan Key Lab Comp Technol Applicat, Kunming 650500, Peoples R China

来源：

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING | 2024年 / 17卷

基金：

中国国家自然科学基金;

关键词：

Object tracking; Autonomous aerial vehicles; Transformers; Feature extraction; Computational modeling; Accuracy; Visualization; Jungle scenes; Mamba; object tracking; UAV remote sensing;

D O I：

10.1109/JSTARS.2024.3458938

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In recent years, UAV object tracking has provided technical support across various fields. Most existing work relies on convolutional neural networks (CNNs) or visual transformers. However, CNNs have limited receptive fields, resulting in suboptimal performance, while transformers require substantial computational resources, making training and inference challenging. Mountainous and jungle environments-critical components of the Earth's surface and key scenarios for UAV object tracking-present unique challenges due to steep terrain, dense vegetation, and rapidly changing weather conditions, which complicate UAV tracking. The lack of relevant datasets further reduces tracking accuracy. This article introduces a new tracking framework based on a state-space model called TrackingMamba, which uses a single-stream tracking architecture with Vision Mamba as its backbone. TrackingMamba not only matches transformer-based trackers in global feature extraction and long-range dependence modeling but also maintains computational efficiency with linear growth. Compared to other advanced trackers, TrackingMamba delivers higher accuracy with a simpler model framework, fewer parameters, and reduced FLOPs. Specifically, on the UAV123 benchmark, TrackingMamba outperforms the baseline model OSTtrack-256, improving AUC by 2.59% and Precision by 4.42%, while reducing parameters by 95.52% and FLOPs by 95.02%. The article also evaluates the performance and shortcomings of TrackingMamba and other advanced trackers in the complex and critical context of jungle environments, and it explores potential future research directions in UAV jungle object tracking.

引用

页码：16744 / 16754

页数：11

共 50 条

[1] LCCDMamba: Visual State Space Model for Land Cover Change Detection of VHR Remote Sensing Images
Huang, Junqing
Yuan, Xiaochen
Lam, Chan-Tong
Wang, Yapeng
Xia, Min
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 5765 - 5781
[2] Siamese Visual Object Tracking: A Survey
Ondrasovic, Milan
Tarabek, Peter
IEEE ACCESS, 2021, 9 : 110149 - 110172
[3] The State-of-the-Art in Visual Object Tracking
Jalal, Anand Singh
Singh, Vrijendra
INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2012, 36 (03): : 227 - 248
[4] OmniTracker: Unifying Visual Object Tracking by Tracking-With-Detection
Wang, Junke
Wu, Zuxuan
Chen, Dongdong
Luo, Chong
Dai, Xiyang
Yuan, Lu
Jiang, Yu-Gang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 3159 - 3174
[5] Sparse Transformer-Based Sequence Generation for Visual Object Tracking
Tian, Dan
Liu, Dong-Xin
Wang, Xiao
Hao, Ying
IEEE ACCESS, 2024, 12 : 154418 - 154425
[6] A visual attention model for robot object tracking
Chu J.-K.
Li R.-H.
Li Q.-Y.
Wang H.-Q.
International Journal of Automation and Computing, 2010, 7 (01) : 39 - 46
[7] A Visual Attention Model for Robot Object Tracking
Jin-Kui Chu Rong-Hua Li Qing-Ying Li Hong-Qing Wang School of Mechanical Engineering
Machine Intelligence Research, 2010, (01) : 39 - 46
[8] Visual Attention Model Based Object Tracking
Ma, Lili
Cheng, Jian
Liu, Jing
Wang, Jinqiao
Lu, Hanging
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING-PCM 2010, PT II, 2010, 6298 : 483 - 493
[9] Feature Aggregation Networks Based on Dual Attention Capsules for Visual Object Tracking
Cao, Yi
Ji, Hongbing
Zhang, Wenbo
Shirani, Shahram
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 674 - 689
[10] Modeling of Multiple Spatial-Temporal Relations for Robust Visual Object Tracking
Wang, Shilei
Wang, Zhenhua
Sun, Qianqian
Cheng, Gong
Ning, Jifeng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5073 - 5085

← 1 2 3 4 5 →