MSF-Net: Multi-stage fusion network for emotion recognition from multimodal signals in scalable healthcare

被引:0
|
作者
Islam, Md. Milon [1 ]
Karray, Fakhri [1 ,2 ]
Muhammad, Ghulam [3 ]
机构
[1] Univ Waterloo, Ctr Pattern Anal & Machine Intelligence, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
[2] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[3] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Engn, Riyadh 11543, Saudi Arabia
基金
加拿大自然科学与工程研究理事会;
关键词
Multimodal emotion recognition; Multi-stage fusion; Vision transformer; Bi-directional Gated Recurrent Unit; Triplet attention; Scalable healthcare;
D O I
10.1016/j.inffus.2025.103028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic emotion recognition has attracted significant interest in healthcare, thanks to remarkable developments made recently in smart and innovative technologies. A real-time emotion recognition system allows for continuous monitoring, comprehension, and enhancement of the physical entity's capacities, along with continuing advice for enhancing quality of life and well-being in the context of personalized healthcare. Multimodal emotion recognition presents a significant challenge in terms of efficiently using the diverse modalities present in the data. In this article, we introduce a Multi-Stage Fusion Network (MSF-Net) for emotion recognition capable of extracting multimodal information and achieving significant performances. We propose utilizing the transformer-based structure to extract deep features from facial expressions. We exploited two visual descriptors, local binary pattern and Oriented FAST and Rotated BRIEF, to retrieve the computer vision- based features from the facial videos. A feature-level fusion network integrates the extraction of features from these modules, directing the output into the triplet attention technique. This module employs a three-branch architecture to compute attention weights to capture cross-dimensional interactions efficiently. The temporal dependencies in physiological signals are modeled by a Bi-directional Gated Recurrent Unit (Bi-GRU) in forward and backward directions at each time step. Lastly, the output feature representations from the triplet attention module and the extracted high-level patterns from Bi-GRU are fused and fed into the classification module to recognize emotion. The extensive experimental evaluations revealed that the proposed MSF-Net outperformed the state-of-the-art approaches on two popular datasets, BioVid Emo DB and MGEED. Finally, we tested the proposed MSF-Net in the Internet of Things environment to facilitate real-world scalable smart healthcare application.
引用
收藏
页数:15
相关论文
共 37 条
  • [21] CoDF-Net: coordinated-representation decision fusion network for emotion recognition with EEG and eye movement signals
    Gong, Xinrong
    Dong, Yihan
    Zhang, Tong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (04) : 1213 - 1226
  • [22] CoDF-Net: coordinated-representation decision fusion network for emotion recognition with EEG and eye movement signals
    Xinrong Gong
    Yihan Dong
    Tong Zhang
    International Journal of Machine Learning and Cybernetics, 2024, 15 (4) : 1213 - 1226
  • [23] Multi-level attention fusion network assisted by relative entropy alignment for multimodal speech emotion recognition
    Lei, Jianjun
    Wang, Jing
    Wang, Ying
    APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8478 - 8490
  • [24] A Novel Multi-Stage Training Approach for Human Activity Recognition From Multimodal Wearable Sensor Data Using Deep Neural Network
    Mahmud, Tanvir
    Sazzad Sayyed, A. Q. M.
    Fattah, Shaikh Anowarul
    Kung, Sun-Yuan
    IEEE SENSORS JOURNAL, 2021, 21 (02) : 1715 - 1726
  • [25] Multi-stage multimodal fusion network with language models and uncertainty evaluation for early risk stratification in rheumatic and musculoskeletal diseases
    Wang, Bing
    Li, Weizi
    Bradlow, Anthony
    Watt, Archie
    Chan, Antoni T. Y.
    Bazuaye, Eghosa
    INFORMATION FUSION, 2025, 120
  • [26] Cross-Modal Guiding Neural Network for Multimodal Emotion Recognition From EEG and Eye Movement Signals
    Fu, Baole
    Chu, Wenhao
    Gu, Chunrui
    Liu, Yinhua
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (10) : 5865 - 5876
  • [27] WT Feature Based Emotion Recognition from Multi-channel Physiological Signals with Decision Fusion
    Xie, Jinyan
    Xu, Xiangmin
    Shu, Lin
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [28] Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition
    Yaqing Hou
    Hua Yu
    Dongsheng Zhou
    Pengfei Wang
    Hongwei Ge
    Jianxin Zhang
    Qiang Zhang
    Neural Computing and Applications, 2021, 33 : 16439 - 16450
  • [29] Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition
    Hou, Yaqing
    Yu, Hua
    Zhou, Dongsheng
    Wang, Pengfei
    Ge, Hongwei
    Zhang, Jianxin
    Zhang, Qiang
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (23): : 16439 - 16450
  • [30] A Privacy-Preserving Multi-Task Learning Framework For Emotion and Identity Recognition from Multimodal Physiological Signals
    Benouis, Mohamed
    Can, Yekta Said
    Andre, Elisabeth
    2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,