A Unified Biosensor-Vision Multi-Modal Transformer network for emotion recognition

被引:0
|
作者
Ali, Kamran [1 ]
Hughes, Charles E. [1 ]
机构
[1] Univ Cent Florida, Dept Comp Sci, Synthet Real Lab, Orlando, FL 32816 USA
基金
美国国家科学基金会;
关键词
Emotion recognition; Biosensor signal; Transformers; Representation learning; SIGNALS; EEG;
D O I
10.1016/j.bspc.2024.107232
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
The development of transformer-based models has resulted insignificant advances in addressing various vision and NLP-based research challenges. However, the progress made in transformer-based methods has not been effectively applied to biosensor/physiological signal-based emotion recognition research. The reasons are that transformers require large training data and most of the biosensor datasets are not large enough to train these models. To address this issue, we propose a novel Unified Biosensor-Vision Multimodal Transformer (UBVMT) architecture, which enables self-supervised pretraining by extracting Remote Photoplethysmography (rPPG) signals from videos in the large CMU-MOSEI dataset. UBVMT classifies emotions in the arousal-valence space by combining a 2D representation of ECG/PPG signals with facial information. As opposed to modality-specific architecture, our novel unified architecture of UBVMT consists of homogeneous transformer blocks that take as input the image-based representation of the biosensor signals and the corresponding face information for emotion representation. This minimal modality-specific design reduces the number of parameters in UBVMT by half compared to conventional multimodal transformer networks, enabling its application in our web- based system, where loading large models poses significant memory challenges. UBVMT is pretrained in a self-supervised manner by employing masked autoencoding to reconstruct masked patches of video frames and 2D scalogram images of ECG/PPG signals, and contrastive modeling to align face and ECG/PPG data. Extensive experiments on publicly available datasets show that our UBVMT-based model produces comparable results to state-of-the-art techniques.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Multi-modal Correlated Network for emotion recognition in speech
    Ren, Minjie
    Nie, Weizhi
    Liu, Anan
    Su, Yuting
    VISUAL INFORMATICS, 2019, 3 (03) : 150 - 155
  • [2] Semantic Alignment Network for Multi-Modal Emotion Recognition
    Hou, Mixiao
    Zhang, Zheng
    Liu, Chang
    Lu, Guangming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329
  • [3] A novel transformer autoencoder for multi-modal emotion recognition with incomplete data
    Cheng, Cheng
    Liu, Wenzhe
    Fan, Zhaoxin
    Feng, Lin
    Jia, Ziyu
    NEURAL NETWORKS, 2024, 172
  • [4] A novel transformer autoencoder for multi-modal emotion recognition with incomplete data
    Cheng, Cheng
    Liu, Wenzhe
    Fan, Zhaoxin
    Feng, Lin
    Jia, Ziyu
    Neural Networks, 2024, 172
  • [5] Multi-modal fusion network with complementarity and importance for emotion recognition
    Liu, Shuai
    Gao, Peng
    Li, Yating
    Fu, Weina
    Ding, Weiping
    INFORMATION SCIENCES, 2023, 619 : 679 - 694
  • [6] Dense Attention Memory Network for Multi-modal emotion recognition
    Ma, Gailing
    Guo, Xiao
    2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 48 - 53
  • [7] SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings
    Nhat Truong Pham
    Duc Ngoc Minh Dang
    Bich Ngoc Hong Pham
    Sy Dzung Nguyen
    PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2023, 2023, : 234 - 238
  • [8] A novel signal channel attention network for multi-modal emotion recognition
    Du, Ziang
    Ye, Xia
    Zhao, Pujie
    FRONTIERS IN NEUROROBOTICS, 2024, 18
  • [9] UniColor: A Unified Framework for Multi-Modal Colorization with Transformer
    Huang, Zhitong
    Zhao, Nanxuan
    Liao, Jing
    ACM TRANSACTIONS ON GRAPHICS, 2022, 41 (06):
  • [10] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 364 - 368