Speech-Visual Emotion Recognition via Modal Decomposition Learning

被引:1
作者
Bai, Lei [1 ]
Chang, Rui [1 ]
Chen, Guanghui [2 ]
Zhou, Yu [1 ]
机构
[1] North China Univ Water Resources & Elect Power, Sch Elect Engn, Zhengzhou 450000, Peoples R China
[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Visualization; Speech recognition; Mel frequency cepstral coefficient; Emotion recognition; Data mining; Three-dimensional displays; modal decomposition; speech modality; visual modality; FEATURES; FUSION;
D O I
10.1109/LSP.2023.3324294
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
It is becoming a mainstream feature fusion approach for speech-visual emotion recognition (SVER) by directly using neural networks to fuse the extracted speech and visual features. However, the heterogeneity between speech and visual modalities usually results in a distribution gap and information redundancy between the extracted speech and visual features, thus affecting the performance of the SVER. To this end, this letter proposes a SVER method based on the modal decomposition learning. It leverages the shared, private and reconstructed modal learning with a specifically designed loss to decompose the extracted speech and visual features into the shared and private subspaces to obtain the shared and private features, which effectively reduces the distribution gap and information redundancy between the extracted speech and visual features. Experiments on the BAUM-1 s, RAVDESS and eNTERFACE05 datasets also show that the proposed method achieves a better result.
引用
收藏
页码:1452 / 1456
页数:5
相关论文
共 31 条
  • [1] Aakanksha Aakanksha, 2022, 2022 International Conference on Automation, Computing and Renewable Systems (ICACRS), P791, DOI 10.1109/ICACRS55517.2022.10029115
  • [2] Mel Frequency Cepstral Coefficient and its Applications: A Review
    Abdul, Zrar Kh.
    Al-Talabani, Abdulbasit K. K.
    [J]. IEEE ACCESS, 2022, 10 : 122136 - 122158
  • [3] A survey on facial emotion recognition techniques: A state-of-the-art literature review
    Canal, Felipe Zago
    Mueller, Tobias Rossi
    Matias, Jhennifer Cristine
    Scotton, Gustavo Gino
    de Sa, Antonio Reis
    Pozzebon, Eliane
    Sobieranski, Antonio Carlos
    [J]. INFORMATION SCIENCES, 2022, 582 : 593 - 617
  • [4] Multi-Modal Emotion Recognition by Fusing Correlation Features of Speech-Visual
    Chen Guanghui
    Zeng Xiaoping
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 533 - 537
  • [5] K-Means Clustering-Based Kernel Canonical Correlation Analysis for Multimodal Emotion Recognition in Human-Robot Interaction
    Chen, Luefeng
    Wang, Kuanlin
    Li, Min
    Wu, Min
    Pedrycz, Witold
    Hirota, Kaoru
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2023, 70 (01) : 1016 - 1024
  • [6] Chen Ting, 2019, 25 AMERICAS C INFORM
  • [7] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [8] Fu Z., 2021, arXiv
  • [9] Harata Seiichi, 2022, HCI International 2022 Posters: 24th International Conference on Human-Computer Interaction, HCII 2022, Virtual Event, Proceedings. Communications in Computer and Information Science (1581), P137, DOI 10.1007/978-3-031-06388-6_18
  • [10] Hina I., 2022, P 2 INT C DIG FUT TR, P1