End-to-end multiple object tracking in high-resolution optical sensors of drones with transformer models

被引:1
作者
Yuan, Yubin [1 ]
Wu, Yiquan [1 ]
Zhao, Langyue [1 ]
Liu, Yuqi [1 ]
Pang, Yaxuan [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Elect & Informat Engn, Nanjing 210016, Peoples R China
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
基金
中国国家自然科学基金;
关键词
Multi-object tracking; Transformer; End to end; Cross frame long-term interaction;
D O I
10.1038/s41598-024-75934-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Drone aerial imaging has become increasingly important across numerous fields as drone optical sensor technology continues to advance. One critical challenge in this domain is achieving both accurate and efficient multi-object tracking. Traditional deep learning methods often separate object identification from tracking, leading to increased complexity and potential performance degradation. Conventional approaches rely heavily on manual feature engineering and intricate algorithms, which can further limit efficiency. To overcome these limitations, we propose a novel Transformer-based end-to-end multi-object tracking framework. This innovative method leverages self-attention mechanisms to capture complex inter-object relationships, seamlessly integrating object detection and tracking into a unified process. By utilizing end-to-end training, our approach simplifies the tracking pipeline, leading to significant performance improvements. A key innovation in our system is the introduction of a trajectory detection label matching technique. This technique assigns labels based on a comprehensive assessment of object appearance, spatial characteristics, and Gaussian features, ensuring more precise and logical label assignments. Additionally, we incorporate cross-frame self-attention mechanisms to extract long-term object properties, providing robust information for stable and consistent tracking. We further enhance tracking performance through a newly developed self-characteristics module, which extracts semantic features from trajectory information across both current and previous frames. This module ensures that the long-term interaction modules maintain semantic consistency, allowing for more accurate and continuous tracking over time. The refined data and stored trajectories are then used as input for subsequent frame processing, creating a feedback loop that sustains tracking accuracy. Extensive experiments conducted on the VisDrone and UAVDT datasets demonstrate the superior performance of our approach in drone-based multi-object tracking.
引用
收藏
页数:16
相关论文
共 55 条
  • [1] Aharon N, 2022, Arxiv, DOI [arXiv:2206.14651, DOI 10.48550/ARXIV.2206.14651]
  • [2] Al-Shakarji N.M., 2018, P 15 IEEE INT C ADV, P1
  • [3] Bisio I., 2023, IEEE Trans. Veh. Technol., P1
  • [4] Bochinski Erik, 2017, 2017 14th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), DOI 10.1109/AVSS.2017.8078516
  • [5] MeMOT: Multi-Object Tracking with Memory
    Cai, Jiarui
    Xu, Mingze
    Li, Wei
    Xiong, Yuanjun
    Xia, Wei
    Tu, Zhuowen
    Soatto, Stefano
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8080 - 8090
  • [6] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [7] VisDrone-MOT2021: The Vision Meets Drone Multiple Object Tracking Challenge Results
    Chen, Guanlin
    Wang, Wenguan
    He, Zhijian
    Wang, Lujia
    Yuan, Yixuan
    Zhang, Dingwen
    Zhang, Jinglin
    Zhu, Pengfei
    Van Gool, Luc
    Han, Junwei
    Hoi, Steven
    Hu, Qinghua
    Liu, Ming
    Sciarrone, Andrea
    Sun, Chao
    Garibotto, Chiara
    Duong Nguyen-Ngoc Tran
    Lavagetto, Fabio
    Haleem, Halar
    Motorcu, Hakki
    Ates, Hasan F.
    Huy-Hung Nguyen
    Jeon, Hyung-Joon
    Bisio, Igor
    Jeon, Jae Wook
    Li, Jiahao
    Long Hoang Pham
    Jeon, Moongu
    Feng, Qianyu
    Li, Shengwen
    Tai Huu-Phuong Tran
    Pan, Xiao
    Song, Young-min
    Yao, Yuehan
    Du, Yunhao
    Xu, Zhenyu
    Luo, Zhipeng
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2839 - 2846
  • [8] Chen M., 2022, arXiv
  • [9] A Transformer-Based Feature Segmentation and Region Alignment Method for UAV-View Geo-Localization
    Dai, Ming
    Hu, Jianhong
    Zhuang, Jiedong
    Zheng, Enhui
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4376 - 4389
  • [10] Jointing Recurrent Across-Channel and Spatial Attention for Multi-Object Tracking With Block-Erasing Data Augmentation
    Deng, Keyu
    Zhang, Congxuan
    Chen, Zhen
    Hu, Weiming
    Li, Bing
    Lu, Feng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4054 - 4069