Semi-supervised Ladder Networks for Speech Emotion Recognition

被引:0
|
作者
Jian-Hua Tao
Jian Huang
Ya Li
Zheng Lian
Ming-Yue Niu
机构
[1] National Laboratory of Pattern Recognition,School of Artificial Intelligence
[2] University of Chinese Academy of Science (CAS),CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Automation
[3] Chinese Academy of Sciences,undefined
来源
International Journal of Automation and Computing | 2019年 / 16卷
关键词
Speech emotion recognition; the ladder network; semi-supervised learning; autoencoder; regularization;
D O I
暂无
中图分类号
学科分类号
摘要
As a major component of speech signal processing, speech emotion recognition has become increasingly essential to understanding human communication. Benefitting from deep learning, many researchers have proposed various unsupervised models to extract effective emotional features and supervised models to train emotion recognition systems. In this paper, we utilize semi-supervised ladder networks for speech emotion recognition. The model is trained by minimizing the supervised loss and auxiliary unsupervised cost function. The addition of the unsupervised auxiliary task provides powerful discriminative representations of the input features, and is also regarded as the regularization of the emotional supervised task. We also compare the ladder network with other classical autoencoder structures. The experiments were conducted on the interactive emotional dyadic motion capture (IEMOCAP) database, and the results reveal that the proposed methods achieve superior performance with a small number of labelled data and achieves better performance than other methods.
引用
收藏
页码:437 / 448
页数:11
相关论文
共 50 条
  • [31] Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks
    Lee, Wonkyum
    Hang, Kyu J.
    Lane, Ian
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3843 - 3847
  • [32] Semi-Supervised Group Emotion Recognition Based on Contrastive Learning
    Zhang, Jiayi
    Wang, Xingzhi
    Zhang, Dong
    Lee, Dah-Jye
    ELECTRONICS, 2022, 11 (23)
  • [33] Semi-supervised Emotion Recognition using Inconsistently Annotated Data
    Happy, S. L.
    Dantcheva, Antitza
    Bremond, Francois
    2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, : 286 - 293
  • [34] Semi-Supervised Dictionary Learning of Sparse Representations for Emotion Recognition
    Kaechele, Markus
    Schwenker, Friedhelm
    PARTIALLY SUPERVISED LEARNING, PSL 2013, 2013, 8193 : 21 - 35
  • [35] USING COLLECTIVE INFORMATION IN SEMI-SUPERVISED LEARNING FOR SPEECH RECOGNITION
    Varadarajan, Balakrishnan
    Yu, Dong
    Deng, Li
    Acero, Alex
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4633 - +
  • [36] DEEP CONTEXTUALIZED ACOUSTIC REPRESENTATIONS FOR SEMI-SUPERVISED SPEECH RECOGNITION
    Ling, Shaoshi
    Liu, Yuzong
    Salazar, Julian
    Kirchhoff, Katrin
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6429 - 6433
  • [37] Unsupervised and semi-supervised adaptation of a hybrid speech recognition system
    Trmal, Jan
    Zelinka, Jan
    Mueller, Ludek
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 527 - 530
  • [38] Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition
    Higuchi, Yosuke
    Moritz, Niko
    Le Roux, Jonathan
    Hori, Takaaki
    INTERSPEECH 2021, 2021, : 726 - 730
  • [39] Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning
    Humayun, Mohammad Ali
    Hameed, Ibrahim A.
    Shah, Syed Muslim
    Khan, Sohaib Hassan
    Zafar, Irfan
    Bin Ahmed, Saad
    Shuja, Junaid
    APPLIED SCIENCES-BASEL, 2019, 9 (09):
  • [40] Emotion recognition using semi-supervised feature selection with speaker normalization
    Sun Y.
    Wen G.
    International Journal of Speech Technology, 2015, 18 (3) : 317 - 331