TLSH-MOT: Drone-View Video Multiple Object Tracking via Transformer-Based Locally Sensitive Hash

被引：0

作者：

Yuan, Yubin ^{[1
]}

Wu, Yiquan ^{[1
]}

Zhao, Langyue ^{[1
]}

Liu, Yuqi ^{[1
]}

Pang, Yaxuan ^{[1
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Elect & Informat Engn, Nanjing 211106, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2025年 / 63卷

基金：

中国国家自然科学基金;

关键词：

Remote sensing; Transformers; Feature extraction; Object tracking; Accuracy; Trajectory; Sensors; Computer vision; Video tracking; Surveillance; Local sensitive hash (LSH); multiple object tracking (MOT); spatiotemporal memory (STM); Transformer;

D O I：

10.1109/TGRS.2025.3545081

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Multiple object tracking (MOT) plays an essential role in drone-view remote sensing applications, such as urban management, emergency rescue, and maritime monitoring. However, due to large variations in object scale and position, the frequent feature loss across frames, and difficulties in matching, traditional methods struggle to achieve high-tracking accuracy in such challenging environments. To address these issues, we propose a Transformer-based locally sensitive hash MOT (TLSH-MOT) method in drone-view remote sensing scenarios. First, a frame-level feature extraction and enhancement module is introduced, integrating a nominee proposal generation (NPG) unit and a tilt convolutional vision Transformer (ViT), which enables adaptive detection of objects across varying scales and perspectives. Next, a spatiotemporal memory (STM) structure is designed to mitigate instantaneous environmental interference and periodic changes using short-term and long-term memory blocks, thereby enhancing tracking stability under complex meteorological conditions. In addition, a temporal enhancement feature decoder (TEFD) fuses multisource feature information to better understand the motion patterns of remote sensing objects. Finally, a local sensitive hash (LSH) IDLinker ensures efficient feature matching, significantly improving trajectory association in large-scale monitoring scenarios. Experimental results show that TLSH-MOT achieves MOT accuracy of 40.7% and 62.2% on VisDrone and UAVDT datasets, respectively, which verifies the superiority of TLSH-MOT in the remote sensing video tracking field. The framework's code is released at: https://github.com/YubinYuan/TLSH-MOT.

引用

页数：16

共 3 条

[1] A transformer-based lightweight method for multiple-object tracking
Wan, Qin
Ge, Zhu
Yang, Yang
Shen, Xuejun
Zhong, Hang
Zhang, Hui
Wang, Yaonan
Wu, Di
IET IMAGE PROCESSING, 2024, 18 (09) : 2329 - 2345
[2] Transformer-Based Multiple-Object Tracking via Anchor-Based-Query and Template Matching
Wang, Qinyu
Lu, Chenxu
Gao, Long
He, Gang
SENSORS, 2024, 24 (01)
[3] Transformer-based visual object tracking via fine-coarse concatenated attention and cross concatenated MLP
Gao, Long
Chen, Langkun
Liu, Pan
Jiang, Yan
Li, Yunsong
Ning, Jifeng
PATTERN RECOGNITION, 2024, 146

← 1 →