A Unified Biosensor-Vision Multi-Modal Transformer network for emotion recognition

被引：0

作者：

Ali, Kamran ^{[1
]}

Hughes, Charles E. ^{[1
]}

机构：

[1] Univ Cent Florida, Dept Comp Sci, Synthet Real Lab, Orlando, FL 32816 USA

来源：

BIOMEDICAL SIGNAL PROCESSING AND CONTROL | 2025年 / 102卷

基金：

美国国家科学基金会;

关键词：

Emotion recognition; Biosensor signal; Transformers; Representation learning; SIGNALS; EEG;

D O I：

10.1016/j.bspc.2024.107232

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

The development of transformer-based models has resulted insignificant advances in addressing various vision and NLP-based research challenges. However, the progress made in transformer-based methods has not been effectively applied to biosensor/physiological signal-based emotion recognition research. The reasons are that transformers require large training data and most of the biosensor datasets are not large enough to train these models. To address this issue, we propose a novel Unified Biosensor-Vision Multimodal Transformer (UBVMT) architecture, which enables self-supervised pretraining by extracting Remote Photoplethysmography (rPPG) signals from videos in the large CMU-MOSEI dataset. UBVMT classifies emotions in the arousal-valence space by combining a 2D representation of ECG/PPG signals with facial information. As opposed to modality-specific architecture, our novel unified architecture of UBVMT consists of homogeneous transformer blocks that take as input the image-based representation of the biosensor signals and the corresponding face information for emotion representation. This minimal modality-specific design reduces the number of parameters in UBVMT by half compared to conventional multimodal transformer networks, enabling its application in our web- based system, where loading large models poses significant memory challenges. UBVMT is pretrained in a self-supervised manner by employing masked autoencoding to reconstruct masked patches of video frames and 2D scalogram images of ECG/PPG signals, and contrastive modeling to align face and ECG/PPG data. Extensive experiments on publicly available datasets show that our UBVMT-based model produces comparable results to state-of-the-art techniques.

引用

页数：11

共 50 条

[11] Multi-modal Emotion Recognition Based on Hypergraph
Zong L.-L.
Zhou J.-H.
Xie Q.-J.
Zhang X.-C.
Xu B.
Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (12): : 2520 - 2534
[12] Bi-Branch Vision Transformer Network for EEG Emotion Recognition
Lu, Wei
Tan, Tien-Ping
Ma, Hua
IEEE ACCESS, 2023, 11 : 36233 - 36243
[13] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
Yang, Dingkang
Huang, Shuai
Liu, Yang
Zhang, Lihua
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
[14] Dynamic Confidence-Aware Multi-Modal Emotion Recognition
Zhu, Qi
Zheng, Chuhang
Zhang, Zheng
Shao, Wei
Zhang, Daoqiang
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1358 - 1370
[15] Multi-Modal Residual Perceptron Network for Audio-Video Emotion Recognition
Chang, Xin
Skarbek, Wladyslaw
SENSORS, 2021, 21 (16)
[16] Multi-Modal Emotion Recognition Fusing Video and Audio
Xu, Chao
Du, Pufeng
Feng, Zhiyong
Meng, Zhaopeng
Cao, Tianyi
Dong, Caichao
APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462
[17] A Deep GRU-BiLSTM Network for Multi-modal Emotion Recognition from Text
Yacoubi, Ibtissem
Ferjaoui, Radhia
Ben Khalifa, Anouar
2024 IEEE 7TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES, SIGNAL AND IMAGE PROCESSING, ATSIP 2024, 2024, : 138 - 143
[18] SMIN: Semi-Supervised Multi-Modal Interaction Network for Conversational Emotion Recognition
Lian, Zheng
Liu, Bin
Tao, Jianhua
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2415 - 2429
[19] Convolutional Transformer Fusion Blocks for Multi-Modal Gesture Recognition
Hampiholi, Basavaraj
Jarvers, Christian
Mader, Wolfgang
Neumann, Heiko
IEEE ACCESS, 2023, 11 : 34094 - 34103
[20] A Multi-modal Visual Emotion Recognition Method to Instantiate an Ontology
Heredia, Juan Pablo A.
Cardinale, Yudith
Dongo, Irvin
Diaz-Amado, Jose
PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES (ICSOFT), 2021, : 453 - 464

← 1 2 3 4 5 →