Speech-Visual Emotion Recognition via Modal Decomposition Learning

被引：1

作者：

Bai, Lei ^{[1
]}

Chang, Rui ^{[1
]}

Chen, Guanghui ^{[2
]}

Zhou, Yu ^{[1
]}

机构：

[1] North China Univ Water Resources & Elect Power, Sch Elect Engn, Zhengzhou 450000, Peoples R China

[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2023年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Visualization; Speech recognition; Mel frequency cepstral coefficient; Emotion recognition; Data mining; Three-dimensional displays; modal decomposition; speech modality; visual modality; FEATURES; FUSION;

D O I：

10.1109/LSP.2023.3324294

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

It is becoming a mainstream feature fusion approach for speech-visual emotion recognition (SVER) by directly using neural networks to fuse the extracted speech and visual features. However, the heterogeneity between speech and visual modalities usually results in a distribution gap and information redundancy between the extracted speech and visual features, thus affecting the performance of the SVER. To this end, this letter proposes a SVER method based on the modal decomposition learning. It leverages the shared, private and reconstructed modal learning with a specifically designed loss to decompose the extracted speech and visual features into the shared and private subspaces to obtain the shared and private features, which effectively reduces the distribution gap and information redundancy between the extracted speech and visual features. Experiments on the BAUM-1 s, RAVDESS and eNTERFACE05 datasets also show that the proposed method achieves a better result.

引用

页码：1452 / 1456

页数：5

共 31 条

[1] Aakanksha Aakanksha, 2022, 2022 International Conference on Automation, Computing and Renewable Systems (ICACRS), P791, DOI 10.1109/ICACRS55517.2022.10029115
[2] Mel Frequency Cepstral Coefficient and its Applications: A Review
Abdul, Zrar Kh.
Al-Talabani, Abdulbasit K. K.
[J]. IEEE ACCESS, 2022, 10 : 122136 - 122158
[3] A survey on facial emotion recognition techniques: A state-of-the-art literature review
Canal, Felipe Zago
Mueller, Tobias Rossi
Matias, Jhennifer Cristine
Scotton, Gustavo Gino
de Sa, Antonio Reis
Pozzebon, Eliane
Sobieranski, Antonio Carlos
[J]. INFORMATION SCIENCES, 2022, 582 : 593 - 617
[4] Multi-Modal Emotion Recognition by Fusing Correlation Features of Speech-Visual
Chen Guanghui
Zeng Xiaoping
[J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 533 - 537
[5] K-Means Clustering-Based Kernel Canonical Correlation Analysis for Multimodal Emotion Recognition in Human-Robot Interaction
Chen, Luefeng
Wang, Kuanlin
Li, Min
Wu, Min
Pedrycz, Witold
Hirota, Kaoru
[J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2023, 70 (01) : 1016 - 1024
[6] Chen Ting, 2019, 25 AMERICAS C INFORM
[7] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[8] Fu Z., 2021, arXiv
[9] Harata Seiichi, 2022, HCI International 2022 Posters: 24th International Conference on Human-Computer Interaction, HCII 2022, Virtual Event, Proceedings. Communications in Computer and Information Science (1581), P137, DOI 10.1007/978-3-031-06388-6_18
[10] Hina I., 2022, P 2 INT C DIG FUT TR, P1

← 1 2 3 4 →