A Unified Biosensor-Vision Multi-Modal Transformer network for emotion recognition

被引：0

作者：

Ali, Kamran ^{[1
]}

Hughes, Charles E. ^{[1
]}

机构：

[1] Univ Cent Florida, Dept Comp Sci, Synthet Real Lab, Orlando, FL 32816 USA

来源：

BIOMEDICAL SIGNAL PROCESSING AND CONTROL | 2025年 / 102卷

基金：

美国国家科学基金会;

关键词：

Emotion recognition; Biosensor signal; Transformers; Representation learning; SIGNALS; EEG;

D O I：

10.1016/j.bspc.2024.107232

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

The development of transformer-based models has resulted insignificant advances in addressing various vision and NLP-based research challenges. However, the progress made in transformer-based methods has not been effectively applied to biosensor/physiological signal-based emotion recognition research. The reasons are that transformers require large training data and most of the biosensor datasets are not large enough to train these models. To address this issue, we propose a novel Unified Biosensor-Vision Multimodal Transformer (UBVMT) architecture, which enables self-supervised pretraining by extracting Remote Photoplethysmography (rPPG) signals from videos in the large CMU-MOSEI dataset. UBVMT classifies emotions in the arousal-valence space by combining a 2D representation of ECG/PPG signals with facial information. As opposed to modality-specific architecture, our novel unified architecture of UBVMT consists of homogeneous transformer blocks that take as input the image-based representation of the biosensor signals and the corresponding face information for emotion representation. This minimal modality-specific design reduces the number of parameters in UBVMT by half compared to conventional multimodal transformer networks, enabling its application in our web- based system, where loading large models poses significant memory challenges. UBVMT is pretrained in a self-supervised manner by employing masked autoencoding to reconstruct masked patches of video frames and 2D scalogram images of ECG/PPG signals, and contrastive modeling to align face and ECG/PPG data. Extensive experiments on publicly available datasets show that our UBVMT-based model produces comparable results to state-of-the-art techniques.

引用

页数：11

共 50 条

[1] Multi-modal Correlated Network for emotion recognition in speech
Ren, Minjie
Nie, Weizhi
Liu, Anan
Su, Yuting
VISUAL INFORMATICS, 2019, 3 (03) : 150 - 155
[2] Semantic Alignment Network for Multi-Modal Emotion Recognition
Hou, Mixiao
Zhang, Zheng
Liu, Chang
Lu, Guangming
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329
[3] A novel transformer autoencoder for multi-modal emotion recognition with incomplete data
Cheng, Cheng
Liu, Wenzhe
Fan, Zhaoxin
Feng, Lin
Jia, Ziyu
NEURAL NETWORKS, 2024, 172
[4] A novel transformer autoencoder for multi-modal emotion recognition with incomplete data
Cheng, Cheng
Liu, Wenzhe
Fan, Zhaoxin
Feng, Lin
Jia, Ziyu
Neural Networks, 2024, 172
[5] Multi-modal fusion network with complementarity and importance for emotion recognition
Liu, Shuai
Gao, Peng
Li, Yating
Fu, Weina
Ding, Weiping
INFORMATION SCIENCES, 2023, 619 : 679 - 694
[6] Dense Attention Memory Network for Multi-modal emotion recognition
Ma, Gailing
Guo, Xiao
2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 48 - 53
[7] SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings
Nhat Truong Pham
Duc Ngoc Minh Dang
Bich Ngoc Hong Pham
Sy Dzung Nguyen
PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2023, 2023, : 234 - 238
[8] A novel signal channel attention network for multi-modal emotion recognition
Du, Ziang
Ye, Xia
Zhao, Pujie
FRONTIERS IN NEUROROBOTICS, 2024, 18
[9] UniColor: A Unified Framework for Multi-Modal Colorization with Transformer
Huang, Zhitong
Zhao, Nanxuan
Liao, Jing
ACM TRANSACTIONS ON GRAPHICS, 2022, 41 (06):
[10] Multi-modal Attention for Speech Emotion Recognition
Pan, Zexu
Luo, Zhaojie
Yang, Jichen
Li, Haizhou
INTERSPEECH 2020, 2020, : 364 - 368

← 1 2 3 4 5 →