MCS-YOLO: A Multiscale Object Detection Method for Autonomous Driving Road Environment Recognition

被引:27
作者
Cao, Yining [1 ]
Li, Chao [1 ]
Peng, Yakun [1 ]
Ru, Huiying [2 ]
机构
[1] Hebei Univ Architecture, Coll Informat Engn, Zhangjiakou 075000, Peoples R China
[2] Hebei Univ Architecture, Coll Sci, Zhangjiakou 075000, Peoples R China
关键词
Task analysis; Object detection; Feature extraction; Autonomous vehicles; Transformers; Road traffic; Detectors; Coordinate attention mechanisms; autonomous driving; road environmental object detection; swin transformer; YOLOv5;
D O I
10.1109/ACCESS.2023.3252021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Object detection and recognition of road scenes are crucial tasks of the autonomous driving environmental perception system. The low inference speed and accuracy in object detection models hinder the development of autonomous driving technology. Searching for improvement of detection accuracy and speed is still a challenging task. For solving these problems, we proposed an MCS-YOLO algorithm. Firstly, a coordinate attention module is inserted into the backbone to aggregate the feature map's spatial coordinate and cross-channel information. Then, we designed a multiscale small object detection structure to improve the recognition sensitivity of dense small object. Finally, we applied the Swin Transformer structure to the CNN to enable the network to focus on contextual spatial information. Conducting ablation study on the autonomous driving dataset BDD100K, MCS-YOLO algorithm achieves a mean average precision of 53.6% and a recall rate of 48.3%, which are 4.3% and 3.9% better than the YOLOv5s algorithm respectively. In addition, it can achieve real-time detection speed of 55 frames per second in a real scene. The results show that the MCS-YOLO algorithm is effective and superior in the task of automatic driving object detection.
引用
收藏
页码:22342 / 22354
页数:13
相关论文
共 46 条
  • [11] He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
  • [12] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (09) : 1904 - 1916
  • [13] Coordinate Attention for Efficient Mobile Network Design
    Hou, Qibin
    Zhou, Daquan
    Feng, Jiashi
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13708 - 13717
  • [14] Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
  • [15] Huibai Wang, 2022, 2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC), P1751, DOI 10.1109/ITOEC53115.2022.9734559
  • [16] Computer Vision for Autonomous Vehicles
    Janai, Joel
    Guney, Fatma
    Behl, Aseem
    Geiger, Andreas
    [J]. FOUNDATIONS AND TRENDS IN COMPUTER GRAPHICS AND VISION, 2020, 12 (1-3): : 1 - 308
  • [17] Kuderer M, 2015, IEEE INT CONF ROBOT, P2641, DOI 10.1109/ICRA.2015.7139555
  • [18] Levinson J, 2011, IEEE INT VEH SYM, P163, DOI 10.1109/IVS.2011.5940562
  • [19] Li CY, 2022, Arxiv, DOI [arXiv:2209.02976, 10.48550/arXiv.2209.02976, DOI 10.48550/ARXIV.2209.02976]
  • [20] H-YoLov3: High performance object detection applied to assisted driving
    Li, Liang
    Li, Xiaoli
    [J]. 2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 462 - 467