MCS-YOLO: A Multiscale Object Detection Method for Autonomous Driving Road Environment Recognition

被引：39

作者：

Cao, Yining ^{[1
]}

Li, Chao ^{[1
]}

Peng, Yakun ^{[1
]}

Ru, Huiying ^{[2
]}

机构：

[1] Hebei Univ Architecture, Coll Informat Engn, Zhangjiakou 075000, Peoples R China

[2] Hebei Univ Architecture, Coll Sci, Zhangjiakou 075000, Peoples R China

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Task analysis; Object detection; Feature extraction; Autonomous vehicles; Transformers; Road traffic; Detectors; Coordinate attention mechanisms; autonomous driving; road environmental object detection; swin transformer; YOLOv5;

D O I：

10.1109/ACCESS.2023.3252021

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Object detection and recognition of road scenes are crucial tasks of the autonomous driving environmental perception system. The low inference speed and accuracy in object detection models hinder the development of autonomous driving technology. Searching for improvement of detection accuracy and speed is still a challenging task. For solving these problems, we proposed an MCS-YOLO algorithm. Firstly, a coordinate attention module is inserted into the backbone to aggregate the feature map's spatial coordinate and cross-channel information. Then, we designed a multiscale small object detection structure to improve the recognition sensitivity of dense small object. Finally, we applied the Swin Transformer structure to the CNN to enable the network to focus on contextual spatial information. Conducting ablation study on the autonomous driving dataset BDD100K, MCS-YOLO algorithm achieves a mean average precision of 53.6% and a recall rate of 48.3%, which are 4.3% and 3.9% better than the YOLOv5s algorithm respectively. In addition, it can achieve real-time detection speed of 55 frames per second in a real scene. The results show that the MCS-YOLO algorithm is effective and superior in the task of automatic driving object detection.

引用

页码：22342 / 22354

页数：13

共 46 条

[1]

Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934

[2]

Chen Y., 2022, IEEE T INTELL TRANSP, V48, P1

[3]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[4]

Feng Yang, 2021, 2021 International Conference on Control, Automation and Information Sciences (ICCAIS), P824, DOI 10.1109/ICCAIS52680.2021.9624546

[5]

Fu C.-Y., 2017, Dssd: Deconvolutional single shot detector

[6]

Ge Z, 2021, Arxiv, DOI [arXiv:2107.08430, 10.48550/arXiv.2107.08430, DOI 10.48550/ARXIV.2107.08430]

[7] NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection [J].

Ghiasi, Golnaz ;

Lin, Tsung-Yi ;

Le, Quoc V. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7029-7038

[8] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[9] Rich feature hierarchies for accurate object detection and semantic segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587

[10]

Gupta A., 2022, PROC IEEE 95 VEH TEC, P1

← 1 2 3 4 5 →