End-to-end multiple object tracking in high-resolution optical sensors of drones with transformer models

被引：1

作者：

Yuan, Yubin ^{[1
]}

Wu, Yiquan ^{[1
]}

Zhao, Langyue ^{[1
]}

Liu, Yuqi ^{[1
]}

Pang, Yaxuan ^{[1
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Elect & Informat Engn, Nanjing 210016, Peoples R China

来源：

SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Multi-object tracking; Transformer; End to end; Cross frame long-term interaction;

D O I：

10.1038/s41598-024-75934-9

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Drone aerial imaging has become increasingly important across numerous fields as drone optical sensor technology continues to advance. One critical challenge in this domain is achieving both accurate and efficient multi-object tracking. Traditional deep learning methods often separate object identification from tracking, leading to increased complexity and potential performance degradation. Conventional approaches rely heavily on manual feature engineering and intricate algorithms, which can further limit efficiency. To overcome these limitations, we propose a novel Transformer-based end-to-end multi-object tracking framework. This innovative method leverages self-attention mechanisms to capture complex inter-object relationships, seamlessly integrating object detection and tracking into a unified process. By utilizing end-to-end training, our approach simplifies the tracking pipeline, leading to significant performance improvements. A key innovation in our system is the introduction of a trajectory detection label matching technique. This technique assigns labels based on a comprehensive assessment of object appearance, spatial characteristics, and Gaussian features, ensuring more precise and logical label assignments. Additionally, we incorporate cross-frame self-attention mechanisms to extract long-term object properties, providing robust information for stable and consistent tracking. We further enhance tracking performance through a newly developed self-characteristics module, which extracts semantic features from trajectory information across both current and previous frames. This module ensures that the long-term interaction modules maintain semantic consistency, allowing for more accurate and continuous tracking over time. The refined data and stored trajectories are then used as input for subsequent frame processing, creating a feedback loop that sustains tracking accuracy. Extensive experiments conducted on the VisDrone and UAVDT datasets demonstrate the superior performance of our approach in drone-based multi-object tracking.

引用

页数：16

共 55 条

[1] Aharon N, 2022, Arxiv, DOI [arXiv:2206.14651, DOI 10.48550/ARXIV.2206.14651]
[2] Al-Shakarji N.M., 2018, P 15 IEEE INT C ADV, P1
[3] Bisio I., 2023, IEEE Trans. Veh. Technol., P1
[4] Bochinski Erik, 2017, 2017 14th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), DOI 10.1109/AVSS.2017.8078516
[5] MeMOT: Multi-Object Tracking with Memory
Cai, Jiarui
Xu, Mingze
Li, Wei
Xiong, Yuanjun
Xia, Wei
Tu, Zhuowen
Soatto, Stefano
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8080 - 8090
[6] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[7] VisDrone-MOT2021: The Vision Meets Drone Multiple Object Tracking Challenge Results
Chen, Guanlin
Wang, Wenguan
He, Zhijian
Wang, Lujia
Yuan, Yixuan
Zhang, Dingwen
Zhang, Jinglin
Zhu, Pengfei
Van Gool, Luc
Han, Junwei
Hoi, Steven
Hu, Qinghua
Liu, Ming
Sciarrone, Andrea
Sun, Chao
Garibotto, Chiara
Duong Nguyen-Ngoc Tran
Lavagetto, Fabio
Haleem, Halar
Motorcu, Hakki
Ates, Hasan F.
Huy-Hung Nguyen
Jeon, Hyung-Joon
Bisio, Igor
Jeon, Jae Wook
Li, Jiahao
Long Hoang Pham
Jeon, Moongu
Feng, Qianyu
Li, Shengwen
Tai Huu-Phuong Tran
Pan, Xiao
Song, Young-min
Yao, Yuehan
Du, Yunhao
Xu, Zhenyu
Luo, Zhipeng
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2839 - 2846
[8] Chen M., 2022, arXiv
[9] A Transformer-Based Feature Segmentation and Region Alignment Method for UAV-View Geo-Localization
Dai, Ming
Hu, Jianhong
Zhuang, Jiedong
Zheng, Enhui
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4376 - 4389
[10] Jointing Recurrent Across-Channel and Spatial Attention for Multi-Object Tracking With Block-Erasing Data Augmentation
Deng, Keyu
Zhang, Congxuan
Chen, Zhen
Hu, Weiming
Li, Bing
Lu, Feng
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4054 - 4069

← 1 2 3 4 5 6 →