Joint subspace learning and feature selection method for speech emotion recognition

被引:0
作者
Song P. [1 ]
Zheng W. [2 ]
Zhao L. [2 ]
机构
[1] School of Computer and Control Engineering, Yantai University, Yantai
[2] Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing
来源
Qinghua Daxue Xuebao/Journal of Tsinghua University | 2018年 / 58卷 / 04期
关键词
Emotion recognition; Feature selection; Subspace learning;
D O I
10.16511/j.cnki.qhdxxb.2018.26.014
中图分类号
学科分类号
摘要
Traditional speech emotion recognition methods are trained and evaluated on a single corpus. However, when the training and testing use different corpora, the recognition performance drops drastically. A joint subspace learning and feature selection method is presented here to imprive recognition. In this method, the feature subspace is learned via a regression algorithm with the l2,1-norm used for feature selection. The maximum mean discrepancy (MMD) is then used to measure the feature divergence between different corpora. Tests show this algorithm gives satisfactory results for cross-corpus speech emotion recognition and is more robust and efficient than state-of-the-art transfer learning methods. © 2018, Tsinghua University Press. All right reserved.
引用
收藏
页码:347 / 351
页数:4
相关论文
共 18 条
[1]  
Han W.J., Li H.F., Ruan H.B., Et al., Review on speech emotion recognition, Journal of Software, 25, 1, pp. 37-50, (2014)
[2]  
Han K., Yu D., Tashev I., Speech emotion recognition using deep neural network and extreme learning machine, Proceedings of the 15th Annual Conference of the International Speech Communication Association., pp. 223-227, (2014)
[3]  
Kinnunen T., Li H.Z., An overview of text-independent speaker recognition: From features to supervectors, Speech Communication, 52, 1, pp. 12-40, (2010)
[4]  
Hu H., Xu M.X., Wu W., GMM supervector based SVM with spectral features for speech emotion recognition, Proceedings of 2007 International Conference on Acoustics, Speech and Signal Processing (ICASSP)., pp. 413-416, (2007)
[5]  
El Ayadi M., Kamel M.S., Karray F., Survey on speech emotion recognition: Features, classification schemes, and databases, Pattern Recognition, 44, 3, pp. 572-587, (2011)
[6]  
Weiss K., Khoshgoftaar T.M., Wang D.D., A survey of transfer learning, Journal of Big Data, 3, 1, pp. 1-40, (2016)
[7]  
Deng J., Zhang Z.X., Eyben F., Et al., Autoencoder-based unsupervised domain adaptation for speech emotion recognition, IEEE Signal Processing Letters, 21, 9, pp. 1068-1072, (2014)
[8]  
Abdelwahab M., Busso C., Supervised domain adaptation for emotion recognition from speech, Proceedings of 2015 International Conference on Acoustics, Speech and Signal Processing (ICASSP)., pp. 5058-5062, (2015)
[9]  
Hassan A., Damper R., Niranjan M., On acoustic emotion recognition: Compensating for covariate shift, IEEE Transactions on Audio, Speech, and Language Processing, 21, 7, pp. 1458-1468, (2013)
[10]  
Song P., Zheng W.M., Liang R.Y., Speech emotion recognition based on sparse transfer learning method, IEICE Transactions on Information and Systems, 98, 7, pp. 1409-1412, (2015)