Speech emotion recognition based on deep belief networks and wavelet packet cepstral coefficients

被引:0
作者
Huang Y. [1 ,2 ]
Wu A. [1 ,2 ]
Zhang G. [1 ,2 ]
Li Y. [1 ,2 ]
机构
[1] School of Automation, Southeast University, Nanjing
[2] Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education
来源
International Journal of Simulation: Systems, Science and Technology | 2016年 / 17卷 / 28期
关键词
Acoustic features; Coiflet Wavelet packets Cepstral Coefficients (CWPCC); Deep Belief Networks (DBNs); Deep learning; Speech emotion recognition;
D O I
10.5013/IJSSST.a.17.28.28
中图分类号
学科分类号
摘要
A wavelet packet based adaptive filter-bank construction combined with Deep Belief Network(DBN) feature learning method is proposed for speech signal processing in this paper. On this basis, a set of acoustic features are extracted for speech emotion recognition, namely Coiflet Wavelet Packet Cepstral Coefficients (CWPCC). CWPCC extends the conventional Mel-Frequency Cepstral Coefficients (MFCC) by adapting the filter-bank structure according to the decision task. And Deep Belief Networks (DBNs) are artificial neural networks having more than one hidden layer, which are first pre-trained layer by layer and then fine-tuned using back propagation algorithm. The well-trained deep neural networks are capable of modeling complex and non-linear features of input training data and can better predict the probability distribution over classification labels. Speech emotion recognition system is constructed with the feature set, DBNs feature learning structure and Support Vector Machine as classifier. Experimental results on Berlin emotional speech database show that the Coiflet Wavelet Packet is more suitable in speech emotion recognition than other acoustics features and proposed DBNs feature learning structure combined with CWPCC improve emotion recognition performance over the conventional emotion recognition method. © 2016, UK Simulation Society. All rights reserved.
引用
收藏
页码:28.1 / 28.5
相关论文
共 14 条
[1]  
Caponetti L., Buscicchio C.A., Castellano G., Biologically inspired emotion recognition from speech, Eurasip Journal on Advances in Signal Processing, 2011, 1, pp. 1-10, (2011)
[2]  
Morrison D., Wang R.L., De Silva L.C., Ensemble methods for spoken emotion recognition in call-centres, Speech Communication, 49, 2, pp. 98-112, (2007)
[3]  
Petrushin V., Emotion recognition in speech signal: Experimental study, development, and application, ICSLP 2000, pp. 222-225, (2000)
[4]  
Malta L., Miyajima C., Kitaoka N., Et al., Multimodal estimation of a driver's spontaneous irritation, Intelligent Vehicles Symposium, 2009 IEEE, pp. 573-577, (2009)
[5]  
France D.J., Shiavi R.G., Silverman S., Et al., Acoustical properties of speech as indicators of depression and suicidal risk, IEEE Transactions on Biomedical Engineering, 47, 7, pp. 829-837, (2000)
[6]  
Stephane M., A Wavelet Tour of Signal Processing, (2009)
[7]  
Bengio Y., Deep learning of representations for unsupervised and transfer learning, Journal of Machine Learning Research-Proceedings Track, 27, 2, pp. 17-36, (2012)
[8]  
Hinton G.E., Salakhutdinov R.R., Reducing the dimensionality of data with neural networks, Science, 313, 5786, pp. 504-507, (2006)
[9]  
Bengio Y., Learning Deep Architectures for AI, 2, 1, pp. 67-76, (2009)
[10]  
Huang Y., Wu A.O., Zhang G., Li Y., Speech Emotion Recognition Based on Coiflet Wavelet Packet Cepstral Coefficients, Chinese Conference on Pattern Recognition, 2014, pp. 436-443, (2014)