A supervised non-negative matrix factorization model for speech emotion recognition

被引:15
作者
Hou, Mixiao [1 ]
Li, Jinxing [2 ,3 ]
Lu, Guangming [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
[2] Chinese Univ Hong Kong, Shenzhen, Peoples R China
[3] Univ Sci & Technol China, Hefei, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Speech emotion recognition; Non-negative matrix factorization; Discriminative information; Sample similarity; Low-dimensional representation; CLASSIFICATION;
D O I
10.1016/j.specom.2020.08.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Feature representation plays a critical role in speech emotion recognition (SER). As a method of data dimen-sionality reduction, Non-negative Matrix Factorization (NMF) can obtain the low-dimensional representation of data by matrix decomposition, and make the data more distinguishable. In order to improve the recognition ability of NMF for SER, we conduct a potential study on NMF and propose a supervised NMF model, called joint discrimination ability and similarity constraint of NMF (DSNMF). This model incorporates the discriminative information and similarity information of samples into basic NMF as prior knowledge, so that the original data can be decomposed into more distinguished low-dimensional data. Specifically, on the one hand, the labels of the training set are used to improve the discriminative ability of the model; on the other hand, with the similarity of the training samples, the data of similar samples are more highly aggregated in the low-dimensional space. In addition, the convergence of DSNMF is proved theoretically and experimentally. Extensive experiments on EMODB and IEMOCAP corpora show that the proposed approach has a better classification effect on low-dimensional representation data than other NMF models.
引用
收藏
页码:13 / 20
页数:8
相关论文
共 30 条
[1]  
Burkhardt F., 2005, P INTERSPEECH
[2]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[3]   Graph Regularized Nonnegative Matrix Factorization for Data Representation [J].
Cai, Deng ;
He, Xiaofei ;
Han, Jiawei ;
Huang, Thomas S. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (08) :1548-1560
[4]   3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition [J].
Chen, Mingyi ;
He, Xuanji ;
Yang, Jing ;
Zhang, Han .
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) :1440-1444
[5]   Survey on speech emotion recognition: Features, classification schemes, and databases [J].
El Ayadi, Moataz ;
Kamel, Mohamed S. ;
Karray, Fakhri .
PATTERN RECOGNITION, 2011, 44 (03) :572-587
[6]  
Eyben F., 2010, P ACM INT C MULT, P1459
[7]   Manifold Regularized Discriminative Nonnegative Matrix Factorization With Fast Gradient Descent [J].
Guan, Naiyang ;
Tao, Dacheng ;
Luo, Zhigang ;
Yuan, Bo .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2011, 20 (07) :2030-2048
[8]  
Gupta S, 2015, 2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN) 2015, P570, DOI 10.1109/SPIN.2015.7095427
[9]  
Han K, 2014, INTERSPEECH, P223
[10]   Semi-Supervised Non-Negative Matrix Factorization With Dissimilarity and Similarity Regularization [J].
Jia, Yuheng ;
Kwong, Sam ;
Hou, Junhui ;
Wu, Wenhui .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (07) :2510-2521