Multimodal shared features learning for emotion recognition by enhanced sparse local discriminative canonical correlation analysis

被引：14

作者：

Fu, Jiamin ^{[1
]}

Mao, Qirong ^{[1
]}

Tu, Juanjuan ^{[2
]}

Zhan, Yongzhao ^{[1
]}

机构：

[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang, Jiangsu, Peoples R China

[2] Jiangsu Univ Sci & Technol, Sch Comp Sci & Engn, Zhenjiang, Jiangsu, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2019年 / 25卷 / 05期

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Multimodal emotion recognition; Multimodal shared feature learning; Multimodal information fusion; Canonical correlation analysis;

D O I：

10.1007/s00530-017-0547-8

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multimodal emotion recognition is a challenging research topic which has recently started to attract the attention of the research community. To better recognize the video users' emotion, the research of multimodal emotion recognition based on audio and video is essential. Multimodal emotion recognition performance heavily depends on finding good shared feature representation. The good shared representation needs to consider two aspects: (1) it has the character of each modality and (2) it can balance the effect of different modalities to make the decision optimal. In the light of these, we propose a novel Enhanced Sparse Local Discriminative Canonical Correlation Analysis approach (En-SLDCCA) to learn the multimodal shared feature representation. The shared feature representation learning involves two stages. In the first stage, we pretrain the Sparse Auto-Encoder with unimodal video (or audio), so that we can obtain the hidden feature representation of video and audio separately. In the second stage, we obtain the correlation coefficients of video and audio using our En-SLDCCA approach, then we form the shared feature representation which fuses the features from video and audio using the correlation coefficients. We evaluate the performance of our method on the challenging multimodal Enterface'05 database. Experimental results reveal that our method is superior to the unimodal video (or audio) and improves significantly the performance for multimodal emotion recognition when compared with the current state of the art.

引用

页码：451 / 461

页数：11

共 26 条

[21] Enhanced multimodal emotion recognition in healthcare analytics: A deep learning based model-level fusion approach
Islam, Md. Milon
Nooruddin, Sheikh
Karray, Fakhri
Muhammad, Ghulam
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 94
[22] EFFICIENT FACIAL AND FACIAL EXPRESSION RECOGNITION USING CANONICAL CORRELATION ANALYSIS FOR TRANSFORM DOMAIN FEATURES FUSION AND CLASSIFICATION
El-Shazly, Ehab H.
Abdelwahab, Moataz M.
Taniguchi, Rin-ichiro
2015 11TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS (SITIS), 2015, : 639 - 644
[23] Robust canonical correlation analysis based on L1-norm minimization for feature learning and image recognition
Wang, Sheng
Du, Haishun
Zhang, Ge
Lu, Jianfeng
Yang, Jingyu
JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (02)
[24] Multi-Task Learning and Sparse Discriminant Canonical Correlation Analysis for Identification of Diagnosis-Specific Genotype-Phenotype Association
Mondal, Sankar
Maji, Pradipta
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2024, 21 (05) : 1390 - 1402
[25] Identifying Modality-Consistent and Modality-Specific Features via Label-Guided Multi-Task Sparse Canonical Correlation Analysis for Neuroimaging Genetics
Hao, Xiaoke
Tan, Qihao
Guo, Yingchun
Xiao, Yunjia
Yu, Ming
Wang, Meiling
Qin, Jing
Zhang, Daoqiang
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2023, 70 (03) : 831 - 840
[26] Integration of dynamic contrast-enhanced magnetic resonance imaging and T2-weighted imaging radiomic features by a canonical correlation analysis-based feature fusion method to predict histological grade in ductal breast carcinoma
Fan, Ming
Liu, Zuhui
Xie, Sudan
Xu, Maosheng
Wang, Shiwei
Gao, Xin
Li, Lihua
PHYSICS IN MEDICINE AND BIOLOGY, 2019, 64 (21)

← 1 2 3 →