Driver multi-task emotion recognition network based on multi-modal facial video analysis

被引：0

作者：

Xiang, Guoliang ^{[1
]}

Yao, Song ^{[1
]}

Wu, Xianhui ^{[1
]}

Deng, Hanwen ^{[1
]}

Wang, Guojie ^{[2
]}

Liu, Yu ^{[2
]}

Li, Fan ^{[3
]}

Peng, Yong ^{[1
]}

机构：

[1] Cent South Univ, Sch Traff & Transportat Engn, Key Lab Traff Safety Track, Minist Educ, Changsha 410075, Peoples R China

[2] China Automot Engn Res Inst Co Ltd, Chongqing 401122, Peoples R China

[3] Hunan Univ, State Key Lab Adv Design & Manufacture Vehicle Bod, Changsha 410082, Peoples R China

来源：

PATTERN RECOGNITION | 2025年 / 161卷

基金：

中国国家自然科学基金;

关键词：

Video analysis; Multi-modal information fusion; Multi-task learning; Driver emotion; Remote physiological signal extraction;

D O I：

10.1016/j.patcog.2024.111241

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Driver emotion recognition is crucial for enhancing the safety and user experience in driving scenarios. However, current emotion recognition methods often rely solely on a single modality and a single-task setup, leading to suboptimal performance in driving scenarios. To address this, this paper proposes a driver multitask emotion recognition method based on multimodal facial video analysis (MER-MFVA). This method extracts facial expression features and remote photoplethysmography (rPPG) signals from driver facial videos. Facial expression features include facial action units and eye movement information, representing the driver's external characteristics. rPPG information, representing the driver's internal characteristics, is enhanced through a designed dual-path Transformer network and an introduced focus module. We also propose a cross-modal mutual attention computation mechanism to effectively fuse multimodal features by calculating mutual attention between facial expression features and rPPG information. In the final task output, we employ a multitask learning mechanism, setting discrete emotion recognition as the primary task and emotion valence recognition, emotion arousal recognition, and the previous rPPG information extraction as auxiliary tasks to facilitate effective information sharing across different tasks. Experimental results on the established driver emotion dataset demonstrate that our proposed method significantly improves driver emotion recognition performance, achieving an accuracy of 86.98% and an F1 score of 85.83% in the primary task. This validates the effectiveness of the proposed approach.

引用

页数：10

共 48 条

[1] OpenFace 2.0: Facial Behavior Analysis Toolkit
Baltrusaitis, Tadas
Zadeh, Amir
Lim, Yao Chong
Morency, Louis-Philippe
[J]. PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 59 - 66
[2] Regional Attention Network (RAN) for Head Pose and Fine-Grained Gesture Recognition
Behera, Ardhendu
Wharton, Zachary
Liu, Yonghuai
Ghahremani, Morteza
Kumar, Swagat
Bessis, Nik
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (01) : 549 - 562
[3] Multi-label, multi-task CNN approach for context-based emotion recognition
Bendjoudi, Ilyes
Vanderhaegen, Frederic
Hamad, Denis
Dornaika, Fadi
[J]. INFORMATION FUSION, 2021, 76 : 422 - 428
[4] Driver Head Pose Detection From Naturalistic Driving Data
Chai, Weiheng
Chen, Jiajing
Wang, Jiyang
Velipasalar, Senem
Venkatachalapathy, Archana
Adu-Gyamfi, Yaw
Merickel, Jennifer
Sharma, Anuj
[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (09) : 9368 - 9377
[5] Effectiveness of multi-task deep learning framework for EEG-based emotion and context recognition
Choo, Sanghyun
Park, Hoonseok
Kim, Sangyeon
Park, Donghyun
Jung, Jae-Yoon
Lee, Sangwon
Nam, Chang S.
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 227
[6] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[7] CONSTANTS ACROSS CULTURES IN FACE AND EMOTION
EKMAN, P
FRIESEN, WV
[J]. JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 1971, 17 (02) : 124 - &
[8] Cross-Cultural Emotion Recognition With EEG and Eye Movement Signals Based on Multiple Stacked Broad Learning System
Gong, Xinrong
Chen, C. L. Philip
Zhang, Tong
[J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02) : 2014 - 2025
[9] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[10] Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
Hara, Kensho
Kataoka, Hirokatsu
Satoh, Yutaka
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6546 - 6555

← 1 2 3 4 5 →