Semi-supervised Ladder Networks for Speech Emotion Recognition

被引：0

作者：

Jian-Hua Tao

Jian Huang

Ya Li

Zheng Lian

Ming-Yue Niu

机构：

[1] National Laboratory of Pattern Recognition,School of Artificial Intelligence

[2] University of Chinese Academy of Science (CAS),CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Automation

[3] Chinese Academy of Sciences,undefined

来源：

International Journal of Automation and Computing | 2019年 / 16卷

关键词：

Speech emotion recognition; the ladder network; semi-supervised learning; autoencoder; regularization;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

As a major component of speech signal processing, speech emotion recognition has become increasingly essential to understanding human communication. Benefitting from deep learning, many researchers have proposed various unsupervised models to extract effective emotional features and supervised models to train emotion recognition systems. In this paper, we utilize semi-supervised ladder networks for speech emotion recognition. The model is trained by minimizing the supervised loss and auxiliary unsupervised cost function. The addition of the unsupervised auxiliary task provides powerful discriminative representations of the input features, and is also regarded as the regularization of the emotional supervised task. We also compare the ladder network with other classical autoencoder structures. The experiments were conducted on the interactive emotional dyadic motion capture (IEMOCAP) database, and the results reveal that the proposed methods achieve superior performance with a small number of labelled data and achieves better performance than other methods.

引用

页码：437 / 448

页数：11

共 54 条

[1]

Gunes H(2013)Categorical and dimensional affect analysis in continuous input: current trends and future directions Image and Vision Computing 31 120-136

[2]

Schuller B(2003)Speech emotion recognition using hidden Markov models Speech Communication 41 603-623

[3]

Nwe T L(2014)Autoencoder-based unsupervised domain adaptation for speech emotion recognition IEEE Signal Processing Letters 21 1068-1072

[4]

Foo S W(2017)A survey on deep learning-based fine-grained object classification and semantic segmentation International Journal of Automation and Computing 14 111-135

[5]

De Silva L C(2018)Applying deep learning to individual and community health monitoring data: a survey International Journal of Automation and Computing 15 643-655

[6]

Deng J(2017)Evaluating deep learning architectures for speech emotion recognition Neural Networks 12 60-68

[7]

Zhang Z X(2011)Survey on speech emotion recognition: features, classification schemes, and databases Pattern Recognition 44 572-587

[8]

Eyben F(2010)Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion Journal of Machine Learning Research 11 3371-3408

[9]

Schuller B(2006)Reducing the dimensionality of data with neural networks Science 313 504-507

[10]

Zhao B(2009)Learning deep architectures for AI Foundations and Trends in Machine Learning 2 1-127

← 1 2 3 4 5 6 →