Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning

被引:8
作者
Xiaoqing J. [1 ,2 ]
Kewen X. [1 ]
Yongliang L. [1 ,3 ]
Jianchuan B. [1 ]
机构
[1] School of Electronic and Information Engineering, Hebei University of Technology, Tianjin
[2] School of Information Science and Engineering, University of Jinan, Jinan
[3] Information Center, Tianjin Chengjian University, Tianjin
来源
J. China Univ. Post Telecom. | / 2卷 / 1,17-9期
关键词
compressed sensing; feature selection; multiple-kernel learning; speech emotion recognition;
D O I
10.1016/S1005-8885(17)60193-6
中图分类号
学科分类号
摘要
Speech emotion recognition (SER) in noisy environment is a vital issue in artificial intelligence (AI). In this paper, the reconstruction of speech samples removes the added noise. Acoustic features extracted from the reconstructed samples are selected to build an optimal feature subset with better emotional recognizability. A multiple-kernel (MK) support vector machine (SVM) classifier solved by semi-definite programming (SDP) is adopted in SER procedure. The proposed method in this paper is demonstrated on Berlin Database of Emotional Speech. Recognition accuracies of the original, noisy, and reconstructed samples classified by both single-kernel (SK) and MK classifiers are compared and analyzed. The experimental results show that the proposed method is effective and robust when noise exists. © 2017 The Journal of China Universities of Posts and Telecommunications
引用
收藏
页码:1,17 / 9
相关论文
共 31 条
[1]  
Tao J., Tan T., Affective computing: a review, Proceedings of the 1st International Conference on Affective Computing and Intelligent Interaction, Oct 22–24, 2005, Beijing, China. LNCS 3784, pp. 981-995, (2005)
[2]  
Schuller B., Batliner A., Steidl S., Et al., Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge, Speech Communication, 53, 9/10, pp. 1062-1087, (2011)
[3]  
Schuller B., Arsic D., Wallhoff F., Et al.
[4]  
You M.Y., Chen C., Bu J.J., Et al., Emotion recognition from noisy speech, Proceedings of the 2006 IEEE International Conference on Multimedia and Expo (ICME'06), July 9–12, 2006, Toronto, Canada, pp. 1653-1656, (2006)
[5]  
Schuller B., Wollmer M., Moosmayr T., Et al., Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement, EURASIP Journal on Audio, Speech, and Music Processing, pp. 942617/1-94261717, (2009)
[6]  
Donoho D.L., Compressed sensing, IEEE Transactions on Information Theory, 52, 4, pp. 1289-1306, (2006)
[7]  
Candes E.J., The restricted isometry property and its implications for compressed sensing, Comptes Rendus Mathematique, 346, 9/10, pp. 589-592, (2008)
[8]  
Zhao X.M., Zhang S.Q., Lei B.C., Robust emotion recognition in noisy speech via sparse representation, Neural Computing and Applications, 24, 7, pp. 1539-1553, (2014)
[9]  
Haupt J., Nowak R., Signal reconstruction from noisy random projections, IEEE Transactions on Information Theory, 52, 9, pp. 4036-4048, (2006)
[10]  
Lanckriet G.R.G., Cristianini N., Bartlett P., Et al., Learning the kernel matrix with semidefinite programming, Journal of Machine Learning Research, 5, 1, pp. 27-72, (2004)