A Review of Deep Learning-Based Visual Multi-Object Tracking Algorithms for Autonomous Driving

被引:36
作者
Guo, Shuman [1 ]
Wang, Shichang [1 ]
Yang, Zhenzhong [1 ]
Wang, Lijun [1 ]
Zhang, Huawei [2 ]
Guo, Pengyan [1 ]
Gao, Yuguo [1 ]
Guo, Junkai [1 ]
机构
[1] N China Univ Water Resources & Elect Power, Sch Mech Engn, Zhengzhou 457003, Peoples R China
[2] Yunnan Vocat Coll Transportat, Sch Intelligent Transportat, Kunming 650500, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 21期
关键词
autonomous driving; deep learning; visual multi-object tracking; transformer; JOINT DETECTION; FEATURES;
D O I
10.3390/app122110741
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Multi-target tracking, a high-level vision job in computer vision, is crucial to understanding autonomous driving surroundings. Numerous top-notch multi-object tracking algorithms have evolved in recent years as a result of deep learning's outstanding performance in the field of visual object tracking. There have been a number of evaluations on individual sub-problems, but none that cover the challenges, datasets, and algorithms associated with visual multi-object tracking in autonomous driving scenarios. In this research, we present an exhaustive study of algorithms in the field of visual multi-object tracking over the last ten years, based on a systematic review approach. The algorithm is broken down into three groups based on its structure: methods for tracking by detection (TBD), joint detection and tracking (JDT), and Transformer-based tracking. The research reveals that the TBD algorithm has a straightforward structure, however the correlation between its individual sub-modules is not very strong. To track multiple objects, the JDT technique combines multi-module joint learning with a deep network framework. Transformer-based algorithms have been explored over the past two years, and they have benefits in numerous assessment indicators, as well as tremendous research potential in the area of multi-object tracking. Theoretical support for algorithmic research in adjacent disciplines is provided by this paper. Additionally, the approach we discuss, which uses merely monocular cameras rather than sophisticated sensor fusion, is anticipated to pave the way for the quick creation of safe and affordable autonomous driving systems.
引用
收藏
页数:27
相关论文
共 93 条
[1]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[2]   Tracking without bells and whistles [J].
Bergmann, Philipp ;
Meinhardt, Tim ;
Leal-Taixe, Laura .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :941-951
[3]   Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics [J].
Bernardin, Keni ;
Stiefelhagen, Rainer .
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2008, 2008 (1)
[4]  
Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003
[5]   Multiple hypothesis tracking for multiple target tracking [J].
Blackman, SS .
IEEE AEROSPACE AND ELECTRONIC SYSTEMS MAGAZINE, 2004, 19 (01) :5-18
[6]  
Bochkovskiy A., 2020, PREPRINT
[7]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[8]  
Cao Jiale, 2022, arXiv
[9]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[10]   Multiple objects tracking by a highly decisive three-frame differencing-combined-background subtraction method with GMPFM-GMPHD filters and VGG16-LSTM classifier [J].
Chandrasekar, K. Silpaja ;
Geetha, P. .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 72