Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment

被引:18
作者
Tang, Jiehao [1 ,2 ]
Ma, Zhuang [1 ,2 ]
Gan, Kaiyu [1 ,2 ]
Zhang, Jianhua [3 ]
Yin, Zhong [1 ,2 ,4 ]
机构
[1] Univ Shanghai Sci & Technol, Engn Res Ctr Opt Instrument & Syst, Shanghai Key Lab Modern Opt Syst, Minist Educ, Shanghai 200093, Peoples R China
[2] Univ Shanghai Sci & Technol, Sch Opt Elect & Comp Engn, Shanghai 200093, Peoples R China
[3] Oslo Metropolitan Univ, Dept Comp Sci, OsloMet Artificial Intelligence Lab, N-0130 Oslo, Norway
[4] Jungong Rd 516, Shanghai 200093, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; Affective computing; Multimodal fusion; Physiological signal; Deep learning; EEG; TRANSFORMER;
D O I
10.1016/j.inffus.2023.102129
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The lack of complementary affective responses from both the central and peripheral nervous systems could limit the performance of emotion recognition with the single-modal physiological signal. However, when integrating multimodalities, a direct fusion may ignore the heterogeneous nature of multiple feature domains from one modality to another. Besides, there is a risk that the distribution of the multimodal physiological responses may vary across different affective scenarios for stimulating an identical emotional category. The inter-individual variation may also increase due to the superposition of the biometric information from the multimodal features. To tackle these issues, we present a hierarchical multimodal network for robust heterogeneous physiological representations (RHPRNet). First, we applied a spatial-frequency pattern extractor to identify the electroencephalogram (EEG) representations in both the spatial and frequency domains. Next, inter-domain and inter-modality affective encoders are separately applied to the statistic-complexity EEG features and multimodal peripheral features, respectively. All the learned representations are integrated via a hierarchical fusion module. To model the multi-peak patterns stimulated by different affective scenarios, we designed a scenario-adapting pretraining stage. A random contrastive training loss was also applied to mitigate the inter-individual variance. In the end, we performed adequate experiments to examine the performance of the RHPRNet based on three publicly available multimodal databases combined with two validation approaches.
引用
收藏
页数:18
相关论文
共 80 条
[1]   Emotions Recognition Using EEG Signals: A Survey [J].
Alarcao, Soraia M. ;
Fonseca, Manuel J. .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (03) :374-393
[2]   Improving BCI-based emotion recognition by combining EEG feature selection and kernel classifiers [J].
Atkinson, John ;
Campos, Daniel .
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 47 :35-41
[3]   Multimodal Machine Learning: A Survey and Taxonomy [J].
Baltrusaitis, Tadas ;
Ahuja, Chaitanya ;
Morency, Louis-Philippe .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443
[4]   An Emotion Recognition Method Based on Eye Movement and Audiovisual Features in MOOC Learning Environment [J].
Bao, Jindi ;
Tao, Xiaomei ;
Zhou, Yinghui .
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (01) :171-183
[5]   MS-MDA: Multisource Marginal Distribution Adaptation for Cross-Subject and Cross-Session EEG Emotion Recognition [J].
Chen, Hao ;
Jin, Ming ;
Li, Zhunan ;
Fan, Cunhang ;
Li, Jinpeng ;
He, Huiguang .
FRONTIERS IN NEUROSCIENCE, 2021, 15
[6]   Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval [J].
Chen, Jing-Jing ;
Ngo, Chong-Wah ;
Feng, Fu-Li ;
Chua, Tat-Seng .
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :1020-1028
[7]   Cross-Modal Recipe Retrieval: How to Cook this Dish? [J].
Chen, Jingjing ;
Pang, Lei ;
Ngo, Chong-Wah .
MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 :588-600
[8]  
Chen Shizhe, 2015, P 5 INT WORKSH AUD V, P49, DOI [DOI 10.1145/2808196.2811638, 10.1145/2808196. 2811638]
[9]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[10]  
Dai YM, 2020, Arxiv, DOI arXiv:2009.14082