MSF-Net: Multi-stage fusion network for emotion recognition from multimodal signals in scalable healthcare

被引：0

作者：

Islam, Md. Milon ^{[1
]}

Karray, Fakhri ^{[1
,2
]}

Muhammad, Ghulam ^{[3
]}

机构：

[1] Univ Waterloo, Ctr Pattern Anal & Machine Intelligence, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada

[2] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates

[3] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Engn, Riyadh 11543, Saudi Arabia

来源：

INFORMATION FUSION | 2025年 / 119卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

Multimodal emotion recognition; Multi-stage fusion; Vision transformer; Bi-directional Gated Recurrent Unit; Triplet attention; Scalable healthcare;

D O I：

10.1016/j.inffus.2025.103028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic emotion recognition has attracted significant interest in healthcare, thanks to remarkable developments made recently in smart and innovative technologies. A real-time emotion recognition system allows for continuous monitoring, comprehension, and enhancement of the physical entity's capacities, along with continuing advice for enhancing quality of life and well-being in the context of personalized healthcare. Multimodal emotion recognition presents a significant challenge in terms of efficiently using the diverse modalities present in the data. In this article, we introduce a Multi-Stage Fusion Network (MSF-Net) for emotion recognition capable of extracting multimodal information and achieving significant performances. We propose utilizing the transformer-based structure to extract deep features from facial expressions. We exploited two visual descriptors, local binary pattern and Oriented FAST and Rotated BRIEF, to retrieve the computer vision- based features from the facial videos. A feature-level fusion network integrates the extraction of features from these modules, directing the output into the triplet attention technique. This module employs a three-branch architecture to compute attention weights to capture cross-dimensional interactions efficiently. The temporal dependencies in physiological signals are modeled by a Bi-directional Gated Recurrent Unit (Bi-GRU) in forward and backward directions at each time step. Lastly, the output feature representations from the triplet attention module and the extracted high-level patterns from Bi-GRU are fused and fed into the classification module to recognize emotion. The extensive experimental evaluations revealed that the proposed MSF-Net outperformed the state-of-the-art approaches on two popular datasets, BioVid Emo DB and MGEED. Finally, we tested the proposed MSF-Net in the Internet of Things environment to facilitate real-world scalable smart healthcare application.

引用

页数：15

共 37 条

[31] MIFAD-Net: Multi-Layer Interactive Feature Fusion Network With Angular Distance Loss for Face Emotion Recognition
Cai, Weiwei
Gao, Ming
Liu, Runmin
Mao, Jie
FRONTIERS IN PSYCHOLOGY, 2021, 12
[32] Multimodal Approach of Speech Emotion Recognition Using Multi-Level Multi-Head Fusion Attention-Based Recurrent Neural Network
Ngoc-Huynh Ho
Yang, Hyung-Jeong
Kim, Soo-Hyung
Lee, Gueesang
IEEE ACCESS, 2020, 8 : 61672 - 61686
[33] Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models
Belal, Mohammad
Hassan, Taimur
Ahmed, Abdelfatah
Aljarah, Ahmad
Alsheikh, Nael
Hussain, Irfan
2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, AVSS 2024, 2024,
[34] Emotion recognition from multiple physiological signals using intra- and inter-modality attention fusion network
Gong, Linlin
Chen, Wanzhong
Li, Mingyang
Zhang, Tao
DIGITAL SIGNAL PROCESSING, 2024, 144
[35] MSIE-Net: Associative Entity-Based Multi-Stage Network for Structured Information Extraction from Reports
Li, Qiuyue
Sheng, Hao
Sheng, Mingxue
Wan, Honglin
APPLIED SCIENCES-BASEL, 2024, 14 (04):
[36] UDNet: Unified Deep Network based on Transformer and Multi-stage Fusion for brain tumor classification from undersampled MRI
Huang, Zhenyu
Duan, Jizhong
Xie, Yunshuang
Liu, Yu
NEUROCOMPUTING, 2025, 619
[37] Attention-based sensor fusion for emotion recognition from human motion by combining convolutional neural network and weighted kernel support vector machine and using inertial measurement unit signals
Zhao, Yan
Guo, Ming
Sun, Xuehan
Chen, Xiangyong
Zhao, Feng
IET SIGNAL PROCESSING, 2023, 17 (04)

← 1 2 3 4 →