Driver multi-task emotion recognition network based on multi-modal facial video analysis

被引:0
作者
Xiang, Guoliang [1 ]
Yao, Song [1 ]
Wu, Xianhui [1 ]
Deng, Hanwen [1 ]
Wang, Guojie [2 ]
Liu, Yu [2 ]
Li, Fan [3 ]
Peng, Yong [1 ]
机构
[1] Cent South Univ, Sch Traff & Transportat Engn, Key Lab Traff Safety Track, Minist Educ, Changsha 410075, Peoples R China
[2] China Automot Engn Res Inst Co Ltd, Chongqing 401122, Peoples R China
[3] Hunan Univ, State Key Lab Adv Design & Manufacture Vehicle Bod, Changsha 410082, Peoples R China
基金
中国国家自然科学基金;
关键词
Video analysis; Multi-modal information fusion; Multi-task learning; Driver emotion; Remote physiological signal extraction;
D O I
10.1016/j.patcog.2024.111241
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Driver emotion recognition is crucial for enhancing the safety and user experience in driving scenarios. However, current emotion recognition methods often rely solely on a single modality and a single-task setup, leading to suboptimal performance in driving scenarios. To address this, this paper proposes a driver multitask emotion recognition method based on multimodal facial video analysis (MER-MFVA). This method extracts facial expression features and remote photoplethysmography (rPPG) signals from driver facial videos. Facial expression features include facial action units and eye movement information, representing the driver's external characteristics. rPPG information, representing the driver's internal characteristics, is enhanced through a designed dual-path Transformer network and an introduced focus module. We also propose a cross-modal mutual attention computation mechanism to effectively fuse multimodal features by calculating mutual attention between facial expression features and rPPG information. In the final task output, we employ a multitask learning mechanism, setting discrete emotion recognition as the primary task and emotion valence recognition, emotion arousal recognition, and the previous rPPG information extraction as auxiliary tasks to facilitate effective information sharing across different tasks. Experimental results on the established driver emotion dataset demonstrate that our proposed method significantly improves driver emotion recognition performance, achieving an accuracy of 86.98% and an F1 score of 85.83% in the primary task. This validates the effectiveness of the proposed approach.
引用
收藏
页数:10
相关论文
共 48 条
  • [1] OpenFace 2.0: Facial Behavior Analysis Toolkit
    Baltrusaitis, Tadas
    Zadeh, Amir
    Lim, Yao Chong
    Morency, Louis-Philippe
    [J]. PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 59 - 66
  • [2] Regional Attention Network (RAN) for Head Pose and Fine-Grained Gesture Recognition
    Behera, Ardhendu
    Wharton, Zachary
    Liu, Yonghuai
    Ghahremani, Morteza
    Kumar, Swagat
    Bessis, Nik
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (01) : 549 - 562
  • [3] Multi-label, multi-task CNN approach for context-based emotion recognition
    Bendjoudi, Ilyes
    Vanderhaegen, Frederic
    Hamad, Denis
    Dornaika, Fadi
    [J]. INFORMATION FUSION, 2021, 76 : 422 - 428
  • [4] Driver Head Pose Detection From Naturalistic Driving Data
    Chai, Weiheng
    Chen, Jiajing
    Wang, Jiyang
    Velipasalar, Senem
    Venkatachalapathy, Archana
    Adu-Gyamfi, Yaw
    Merickel, Jennifer
    Sharma, Anuj
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (09) : 9368 - 9377
  • [5] Effectiveness of multi-task deep learning framework for EEG-based emotion and context recognition
    Choo, Sanghyun
    Park, Hoonseok
    Kim, Sangyeon
    Park, Donghyun
    Jung, Jae-Yoon
    Lee, Sangwon
    Nam, Chang S.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 227
  • [6] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [7] CONSTANTS ACROSS CULTURES IN FACE AND EMOTION
    EKMAN, P
    FRIESEN, WV
    [J]. JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 1971, 17 (02) : 124 - &
  • [8] Cross-Cultural Emotion Recognition With EEG and Eye Movement Signals Based on Multiple Stacked Broad Learning System
    Gong, Xinrong
    Chen, C. L. Philip
    Zhang, Tong
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02) : 2014 - 2025
  • [9] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
  • [10] Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
    Hara, Kensho
    Kataoka, Hirokatsu
    Satoh, Yutaka
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6546 - 6555