A Lightweight and Efficient Distracted Driver Detection Model Fusing Convolutional Neural Network and Vision Transformer

被引:1
作者
Li, Zhao [1 ]
Zhao, Xia [2 ]
Wu, Fuwei [1 ]
Chen, Dan [3 ]
Wang, Chang [1 ]
机构
[1] Changan Univ, Sch Automobile, Xian 710061, Peoples R China
[2] Jiangsu Univ, Automot Engn Res Inst, Zhenjiang 212013, Peoples R China
[3] Changan Univ, Sch Elect & Control Engn, Xian 710061, Peoples R China
关键词
Feature extraction; Vehicles; Transformers; Convolution; Computational modeling; Accuracy; Convolutional neural networks; Traffic safety; distracted driver recognition; convolutional neural network; vision transformer; lightweight model; multi-scale feature; RECOGNITION; FRAMEWORK;
D O I
10.1109/TITS.2024.3447041
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Identifying distracted drivers is crucial for enhancing driving safety and advancing intelligent driver assistance systems. Recently, researchers have applied Convolutional Neural Network (CNN) and Vision Transformer (ViT) models for driver state decision. However, both models often suffer from several issues such as numerous parameters and low detection efficiency. To address these challenges, this study proposes the Convolution Vision Transformer (CoViT) model for distracted driver identification, leveraging techniques such as Low Complexity Attention Mechanism (LCAM), Multi-scale Dilation Convolution (MSDC), and Depth Separable Convolution (DSC). Moreover, the CoViT model features a typical "pyramid" structure, enabling effective feature extraction across different scales. Subsequently, the proposed system is trained and evaluated using the publicly available driving behavior datasets SFD2 and 100-Driver, as well as real-world road experiments. Experimental results show that the CoViT model yields high recognition performance, with mean Accuracy (mAcc) scores of 95.17%, 97.89%, and 93.54% on the recorded dataset, SFD2 dataset, and 100-Driver dataset, respectively. These scores surpass those obtained by similar lightweight models. Furthermore, ablation experiments reveal that deep and dilated convolution significantly enhance model performance. In addition, the CoViT model demonstrates its applicability to real-time driving behavior detection tasks, with a parametric count of just 1.24M - a reduction of 2.67M compared to MobileNetV3 - and an online inference Frames Per Second (FPS) of 159.13.
引用
收藏
页码:19962 / 19978
页数:17
相关论文
共 66 条
  • [1] Distracted driver classification using deep learning
    Alotaibi, Munif
    Alotaibi, Bandar
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2020, 14 (03) : 617 - 624
  • [2] Do driver monitoring technologies improve the driving behaviour of distracted drivers? A simulation study to assess the impact of an auditory driver distraction warning device on driving performance
    Bassani, M.
    Catani, L.
    Hazoor, A.
    Hoxha, A.
    Lioi, A.
    Portera, A.
    Tefa, L.
    [J]. TRANSPORTATION RESEARCH PART F-TRAFFIC PSYCHOLOGY AND BEHAVIOUR, 2023, 95 : 239 - 250
  • [3] VNAGT: Variational Non-Autoregressive Graph Transformer Network for Multi-Agent Trajectory Prediction
    Chen, Xiaobo
    Zhang, Huanjia
    Hu, Yu
    Liang, Jun
    Wang, Hai
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (10) : 12540 - 12552
  • [4] Mobile-Former: Bridging MobileNet and Transformer
    Chen, Yinpeng
    Dai, Xiyang
    Chen, Dongdong
    Liu, Mengchen
    Dong, Xiaoyi
    Yuan, Lu
    Liu, Zicheng
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5260 - 5269
  • [5] EdgeViT: Efficient Visual Modeling for Edge Computing
    Chen, Zekai
    Zhong, Fangtian
    Luo, Qi
    Zhang, Xiao
    Zheng, Yanwei
    [J]. WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, PT III, 2022, 13473 : 393 - 405
  • [6] A multi-feature fusion algorithm for driver fatigue detection based on a lightweight convolutional neural network
    Cheng, Wangfeng
    Wang, Xuanyao
    Mao, Bangguo
    [J]. VISUAL COMPUTER, 2024, 40 (04) : 2419 - 2441
  • [7] Real-time detection method of driver fatigue state based on deep learning of face video
    Cui, Zhe
    Sun, Hong-Mei
    Yin, Ruo-Nan
    Gao, Li
    Sun, Hai-Bin
    Jia, Rui-Sheng
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (17) : 25495 - 25515
  • [8] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [9] Real-time detection of distracted driving based on deep learning
    Duy Tran
    Ha Manh Do
    Sheng, Weihua
    Bai, He
    Chowdhary, Girish
    [J]. IET INTELLIGENT TRANSPORT SYSTEMS, 2018, 12 (10) : 1210 - 1219
  • [10] Fang Z., 2022, P 6 CAA INT C VEH CO, P1