YOLOv3-MT: A YOLOv3 using multi-target tracking for vehicle visual detection

被引：35

作者：

Wang, Kun ^{[1
]}

Liu, Maozhen ^{[1
]}

机构：

[1] Civil Aviat Univ China, Coll Elect Informat & Automat, Tianjin 300300, Peoples R China

来源：

APPLIED INTELLIGENCE | 2022年 / 52卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Object detection; Class-wise k-means clustering; Occluded environment; YOLOv3; Kalman filter; MULTISCALE;

D O I：

10.1007/s10489-021-02491-3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

During automatic driving, the complex background and mutual occlusion between multiple targets hinder the correct judgment of the detector and miss detection. When a close-range target is captured again, the vehicle may not be able to respond in time and cause a fatal accident. Therefore, in the application of auxiliary systems, a model that can accurately identify partially occluded targets in complex backgrounds and perform short-term tracking and early warning of completely occluded objects is required. This paper proposes a method to improve detection accuracy while supporting real-time operations based on YOLOv3 and realize real-time warnings for those objects that are completely blocked. First, we obtain a more suitable prior frames setting through class-wise K-means clustering. To solve the problem that the maxpool operation of original CBAM easily introduces background noise, we proposed AS-CBAM(Adaptive Selection Convolutional Block Attention Module) and innovatively combined the HDC(Hybrid Dilated Convolution) to maximize the receptive field and fine-tune the characteristics. The 1x1 convolution operation is used to suppress the increase of the parameter amount. In this study, DIOU-NMS was used to replace traditional NMS. Besides, a tracking algorithm based on Kalman filtering and Hungarian matching is introduced to improve the system's ability to recognize occluded objects. Compared with the traditional YOLOv3, the proposed method can increase the mAP by 1.32% and 1.47% on KITTI and UA-DETRAC, respectively. Nevertheless, it shows a processing speed of 35.07FPS and a more significant improvement in accuracy (90.36% vs. 85.71%) on the Object-Mask, a dataset that focuses on occlusion conditions. Therefore, the proposed algorithm is more suitable for autonomous driving applications.

引用

页码：2070 / 2091

页数：22

共 38 条

[1] A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection [J].

Cai, Zhaowei ;

Fan, Quanfu ;

Feris, Rogerio S. ;

Vasconcelos, Nuno .

COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :354-370

[2]

Dai JF, 2016, ADV NEUR IN, V29

[3] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[4] CenterNet: Keypoint Triplets for Object Detection [J].

Duan, Kaiwen ;

Bai, Song ;

Xie, Lingxi ;

Qi, Honggang ;

Huang, Qingming ;

Tian, Qi .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6568-6577

[5]

Felzenszwalb P, 2008, PROC CVPR IEEE, P1984

[6]

Gao P, 2020, ARXIV PREPRINT ARXIV

[7]

[高韬 Gao Tao], 2010, [中国公路学报, China Journal of Highway and Transport], V23, P89

[8]

Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074

[9] Inception single shot multi-box detector with affinity propagation clustering and their application in multi-class vehicle counting [J].

Harikrishnan, P. M. ;

Thomas, Anju ;

Gopi, Varun P. ;

Palanisamy, P. ;

Wahid, Khan A. .

APPLIED INTELLIGENCE, 2021, 51 (07) :4714-4729

[10]

He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]

← 1 2 3 4 →