Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress

被引：6

作者：

Dogan, Gulin ^{[1
]}

Akbulut, Fatma Patlar ^{[2
]}

机构：

[1] Istanbul Kultur Univ, Dept Comp Engn, TR-34158 Istanbul, Turkiye

[2] Istanbul Kultur Univ, Dept Software Engn, TR-34158 Istanbul, Turkiye

来源：

NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 34期

关键词：

Stress detection; Sequential and non-sequential model; Fine-tuning; Multi-modality; MOMENTARY ASSESSMENT; RECOGNITION; VOICE; FACE;

D O I：

10.1007/s00521-023-09036-4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mental stress is a significant risk factor for several maladies and can negatively impact a person's quality of life, including their work and personal relationships. Traditional methods of detecting mental stress through interviews and questionnaires may not capture individuals' instantaneous emotional responses. In this study, the method of experience sampling was used to analyze the participants' immediate affective responses, which provides a more comprehensive and dynamic understanding of the participants' experiences. WorkStress3D dataset was compiled using information gathered from 20 participants for three distinct modalities. During an average of one week, 175 h of data containing physiological signals such as BVP, EDA, and body temperature, as well as facial expressions and auditory data, were collected from a single subject. We present a novel fusion model that uses double-early fusion approaches to combine data from multiple modalities. The model's F1 score of 0.94 with a loss of 0.18 is very encouraging, showing that it can accurately identify and classify varying degrees of stress. Furthermore, we investigate the utilization of transfer learning techniques to improve the efficacy of our stress detection system. Despite our efforts, we were unable to attain better results than the fusion model. Transfer learning resulted in an accuracy of 0.93 and a loss of 0.17, illustrating the difficulty of adapting pre-trained models to the task of stress analysis. The results we obtained emphasize the significance of multi-modal fusion in stress detection and the importance of selecting the most suitable model architecture for the given task. The proposed fusion model demonstrates its potential for achieving an accurate and robust classification of stress. This research contributes to the field of stress analysis and contributes to the development of effective models for stress detection.

引用

页码：24435 / 24454

页数：20

共 50 条

[1] Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress
Gulin Dogan
Fatma Patlar Akbulut
Neural Computing and Applications, 2023, 35 : 24435 - 24454
[2] Multi-Modal Anomaly Detection by Using Audio and Visual Cues
Rehman, Ata-Ur
Ullah, Hafiz Sami
Farooq, Haroon
Khan, Muhammad Salman
Mahmood, Tayyeb
Khan, Hafiz Owais Ahmed
IEEE ACCESS, 2021, 9 : 30587 - 30603
[3] Visual audio and textual triplet fusion network for multi-modal sentiment analysis
Lv, Cai-Chao
Zhang, Xuan
Zhang, Hong-Bo
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (12) : 9505 - 9513
[4] Audio-Visual Scene Classification Based on Multi-modal Graph Fusion
Lei, Han
Chen, Ning
INTERSPEECH 2022, 2022, : 4157 - 4161
[5] Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation
Wang, Xiaoyu
Kong, Xiangyu
Peng, Xiulian
Lu, Yan
INTERSPEECH 2022, 2022, : 886 - 890
[6] Online video visual relation detection with hierarchical multi-modal fusion
He, Yuxuan
Gan, Ming-Gang
Ma, Qianzhao
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (24) : 65707 - 65727
[7] Video Visual Relation Detection via Multi-modal Feature Fusion
Sun, Xu
Ren, Tongwei
Zi, Yuan
Wu, Gangshan
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2657 - 2661
[8] Generalized concept overlay for semantic multi-modal analysis of audio-visual content
Mezaris, Vasileios
Gidaros, Spyros
Kompatsiaris, Ioannis
PROCEEDINGS 2009 FOURTH INTERNATIONAL WORKSHOP ON SEMANTIC MEDIA ADAPTATION AND PERSONALIZATION, 2009, : 27 - 32
[9] Learning Visual Emotion Distributions via Multi-Modal Features Fusion
Zhao, Sicheng
Ding, Guiguang
Gao, Yue
Han, Jungong
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 369 - 377
[10] IMAGE DESCRIPTION THROUGH FUSION BASED RECURRENT MULTI-MODAL LEARNING
Oruganti, Ram Manohar
Sah, Shagan
Pillai, Suhas
Ptucha, Raymond
2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3613 - 3617

← 1 2 3 4 5 →