TLSH-MOT: Drone-View Video Multiple Object Tracking via Transformer-Based Locally Sensitive Hash

被引:0
|
作者
Yuan, Yubin [1 ]
Wu, Yiquan [1 ]
Zhao, Langyue [1 ]
Liu, Yuqi [1 ]
Pang, Yaxuan [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Elect & Informat Engn, Nanjing 211106, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2025年 / 63卷
基金
中国国家自然科学基金;
关键词
Remote sensing; Transformers; Feature extraction; Object tracking; Accuracy; Trajectory; Sensors; Computer vision; Video tracking; Surveillance; Local sensitive hash (LSH); multiple object tracking (MOT); spatiotemporal memory (STM); Transformer;
D O I
10.1109/TGRS.2025.3545081
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Multiple object tracking (MOT) plays an essential role in drone-view remote sensing applications, such as urban management, emergency rescue, and maritime monitoring. However, due to large variations in object scale and position, the frequent feature loss across frames, and difficulties in matching, traditional methods struggle to achieve high-tracking accuracy in such challenging environments. To address these issues, we propose a Transformer-based locally sensitive hash MOT (TLSH-MOT) method in drone-view remote sensing scenarios. First, a frame-level feature extraction and enhancement module is introduced, integrating a nominee proposal generation (NPG) unit and a tilt convolutional vision Transformer (ViT), which enables adaptive detection of objects across varying scales and perspectives. Next, a spatiotemporal memory (STM) structure is designed to mitigate instantaneous environmental interference and periodic changes using short-term and long-term memory blocks, thereby enhancing tracking stability under complex meteorological conditions. In addition, a temporal enhancement feature decoder (TEFD) fuses multisource feature information to better understand the motion patterns of remote sensing objects. Finally, a local sensitive hash (LSH) IDLinker ensures efficient feature matching, significantly improving trajectory association in large-scale monitoring scenarios. Experimental results show that TLSH-MOT achieves MOT accuracy of 40.7% and 62.2% on VisDrone and UAVDT datasets, respectively, which verifies the superiority of TLSH-MOT in the remote sensing video tracking field. The framework's code is released at: https://github.com/YubinYuan/TLSH-MOT.
引用
收藏
页数:16
相关论文
共 3 条
  • [1] A transformer-based lightweight method for multiple-object tracking
    Wan, Qin
    Ge, Zhu
    Yang, Yang
    Shen, Xuejun
    Zhong, Hang
    Zhang, Hui
    Wang, Yaonan
    Wu, Di
    IET IMAGE PROCESSING, 2024, 18 (09) : 2329 - 2345
  • [2] Transformer-Based Multiple-Object Tracking via Anchor-Based-Query and Template Matching
    Wang, Qinyu
    Lu, Chenxu
    Gao, Long
    He, Gang
    SENSORS, 2024, 24 (01)
  • [3] Transformer-based visual object tracking via fine-coarse concatenated attention and cross concatenated MLP
    Gao, Long
    Chen, Langkun
    Liu, Pan
    Jiang, Yan
    Li, Yunsong
    Ning, Jifeng
    PATTERN RECOGNITION, 2024, 146