Autoencoder with emotion embedding for speech emotion recognition

被引:0
|
作者
Zhang, Chenghao [1 ]
Xue, Lei [1 ]
机构
[1] School of Communication and Information Engineering, Shanghai University, Shanghai,200444, China
关键词
D O I
暂无
中图分类号
学科分类号
摘要
An important part of the human-computer interaction process is speech emotion recognition (SER), which has been receiving more attention in recent years. However, although a wide diversity of methods has been proposed in SER, these approaches still cannot improve the performance. A key issue in the low performance of the SER system is how to effectively extract emotion-oriented features. In this paper, we propose a novel algorithm, an autoencoder with emotion embedding, to extract deep emotion features. Unlike many previous works, instance normalization, which is a common technique in the style transfer field, is introduced into our model rather than batch normalization. Furthermore, the emotion embedding path in our method can lead the autoencoder to efficiently learn a priori knowledge from the label. It can enable the model to distinguish which features are most related to human emotion. We concatenate the latent representation learned by the autoencoder and acoustic features obtained by the openSMILE toolkit. Finally, the concatenated feature vector is utilized for emotion classification. To improve the generalization of our method, a simple data augmentation approach is applied. Two publicly available and highly popular databases, IEMOCAP and EMODB, are chosen to evaluate our method. Experimental results demonstrate that the proposed model achieves significant performance improvement compared to other speech emotion recognition systems. © 2013 IEEE.
引用
收藏
页码:51231 / 51241
相关论文
共 50 条
  • [21] Single- and Cross-Lingual Speech Emotion Recognition Based on WavLM Domain Emotion Embedding
    Yang, Jichen
    Liu, Jiahao
    Huang, Kai
    Xia, Jiaqi
    Zhu, Zhengyu
    Zhang, Han
    ELECTRONICS, 2024, 13 (07)
  • [22] A novel classifier based on Enhanced Lipschitz Embedding for speech emotion recognition
    You, Mingyu
    Li, Guo-Zheng
    Chen, Luonan
    Tao, Jianhua
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF THEORETICAL AND METHODOLOGICAL ISSUES, 2008, 5226 : 482 - +
  • [23] Sparse Autoencoder-based Feature Transfer Learning for Speech Emotion Recognition
    Deng, Jun
    Zhang, Zixing
    Marchi, Erik
    Schuller, Bjoern
    2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, : 511 - 516
  • [24] Performance Improvement of Speech Emotion Recognition by Neutral Speech Detection Using Autoencoder and Intermediate Representation
    Santoso, Jennifer
    Yamada, Takeshi
    Ishizuka, Kenkichi
    Hashimoto, Taiichi
    Makino, Shoji
    INTERSPEECH 2022, 2022, : 4700 - 4704
  • [25] PulseEmoNet: Pulse emotion network for speech emotion recognition
    Zhang, Huiyun
    Tang, Gaigai
    Huang, Heming
    Yuan, Zhu
    Li, Zongjin
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 105
  • [26] Emotion Recognition in Arabic Speech
    Klaylat, Samira
    Hamandi, Lama
    Osman, Ziad
    Zantout, Rached
    2017 SENSORS NETWORKS SMART AND EMERGING TECHNOLOGIES (SENSET), 2017,
  • [27] Emotion recognition in Arabic speech
    Samira Klaylat
    Ziad Osman
    Lama Hamandi
    Rached Zantout
    Analog Integrated Circuits and Signal Processing, 2018, 96 : 337 - 351
  • [28] Multiroom Speech Emotion Recognition
    Shalev, Erez
    Cohen, Israel
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 135 - 139
  • [29] Persian Speech Emotion Recognition
    Savargiv, Mohammad
    Bastanfard, Azam
    2015 7TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT), 2015,
  • [30] Progress in speech emotion recognition
    Zhang, Xueying
    Sun, Ying
    Duan, Shufei
    TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE, 2015,