Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress

被引:6
|
作者
Dogan, Gulin [1 ]
Akbulut, Fatma Patlar [2 ]
机构
[1] Istanbul Kultur Univ, Dept Comp Engn, TR-34158 Istanbul, Turkiye
[2] Istanbul Kultur Univ, Dept Software Engn, TR-34158 Istanbul, Turkiye
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 34期
关键词
Stress detection; Sequential and non-sequential model; Fine-tuning; Multi-modality; MOMENTARY ASSESSMENT; RECOGNITION; VOICE; FACE;
D O I
10.1007/s00521-023-09036-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mental stress is a significant risk factor for several maladies and can negatively impact a person's quality of life, including their work and personal relationships. Traditional methods of detecting mental stress through interviews and questionnaires may not capture individuals' instantaneous emotional responses. In this study, the method of experience sampling was used to analyze the participants' immediate affective responses, which provides a more comprehensive and dynamic understanding of the participants' experiences. WorkStress3D dataset was compiled using information gathered from 20 participants for three distinct modalities. During an average of one week, 175 h of data containing physiological signals such as BVP, EDA, and body temperature, as well as facial expressions and auditory data, were collected from a single subject. We present a novel fusion model that uses double-early fusion approaches to combine data from multiple modalities. The model's F1 score of 0.94 with a loss of 0.18 is very encouraging, showing that it can accurately identify and classify varying degrees of stress. Furthermore, we investigate the utilization of transfer learning techniques to improve the efficacy of our stress detection system. Despite our efforts, we were unable to attain better results than the fusion model. Transfer learning resulted in an accuracy of 0.93 and a loss of 0.17, illustrating the difficulty of adapting pre-trained models to the task of stress analysis. The results we obtained emphasize the significance of multi-modal fusion in stress detection and the importance of selecting the most suitable model architecture for the given task. The proposed fusion model demonstrates its potential for achieving an accurate and robust classification of stress. This research contributes to the field of stress analysis and contributes to the development of effective models for stress detection.
引用
收藏
页码:24435 / 24454
页数:20
相关论文
共 50 条
  • [1] Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress
    Gulin Dogan
    Fatma Patlar Akbulut
    Neural Computing and Applications, 2023, 35 : 24435 - 24454
  • [2] Multi-Modal Anomaly Detection by Using Audio and Visual Cues
    Rehman, Ata-Ur
    Ullah, Hafiz Sami
    Farooq, Haroon
    Khan, Muhammad Salman
    Mahmood, Tayyeb
    Khan, Hafiz Owais Ahmed
    IEEE ACCESS, 2021, 9 : 30587 - 30603
  • [3] Visual audio and textual triplet fusion network for multi-modal sentiment analysis
    Lv, Cai-Chao
    Zhang, Xuan
    Zhang, Hong-Bo
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (12) : 9505 - 9513
  • [4] Audio-Visual Scene Classification Based on Multi-modal Graph Fusion
    Lei, Han
    Chen, Ning
    INTERSPEECH 2022, 2022, : 4157 - 4161
  • [5] Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation
    Wang, Xiaoyu
    Kong, Xiangyu
    Peng, Xiulian
    Lu, Yan
    INTERSPEECH 2022, 2022, : 886 - 890
  • [6] Online video visual relation detection with hierarchical multi-modal fusion
    He, Yuxuan
    Gan, Ming-Gang
    Ma, Qianzhao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (24) : 65707 - 65727
  • [7] Video Visual Relation Detection via Multi-modal Feature Fusion
    Sun, Xu
    Ren, Tongwei
    Zi, Yuan
    Wu, Gangshan
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2657 - 2661
  • [8] Generalized concept overlay for semantic multi-modal analysis of audio-visual content
    Mezaris, Vasileios
    Gidaros, Spyros
    Kompatsiaris, Ioannis
    PROCEEDINGS 2009 FOURTH INTERNATIONAL WORKSHOP ON SEMANTIC MEDIA ADAPTATION AND PERSONALIZATION, 2009, : 27 - 32
  • [9] Learning Visual Emotion Distributions via Multi-Modal Features Fusion
    Zhao, Sicheng
    Ding, Guiguang
    Gao, Yue
    Han, Jungong
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 369 - 377
  • [10] IMAGE DESCRIPTION THROUGH FUSION BASED RECURRENT MULTI-MODAL LEARNING
    Oruganti, Ram Manohar
    Sah, Shagan
    Pillai, Suhas
    Ptucha, Raymond
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3613 - 3617