Unsupervised feature selection and NMF de-noising for robust Speech Emotion Recognition

被引:29
作者
Bandela, Surekha Reddy [1 ]
Kumar, T. Kishore [1 ]
机构
[1] NIT Warangal, Dept ECE, Warangal, Andhra Pradesh, India
关键词
Speech Emotion Recognition; Feature optimization; Unsupervised feature selection; Speech de-noising; NMF; ENHANCEMENT; SPARSE;
D O I
10.1016/j.apacoust.2020.107645
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech feature fusion is the most commonly used phenomenon for improving the accuracy in Speech Emotion Recognition (SER). But in this, there is a disadvantage of increasing the complexity in SER system in terms of processing time. Besides this, some of the features could be redundant and do not contribute for SER and lead to incorrect emotion prediction and reduction in SER accuracy. To overcome this problem, in this paper, unsupervised feature selection is applied to the feature set with the combination of INTERSPEECH 2010 paralinguistic features, Gammatone Cepstral Coefficients (GTCC) and Power Normalized Cepstral Coefficients (PNCC). The Feature Selection with Adaptive Structure Learning (FSASL), Unsupervised Feature Selection with Ordinal Locality (UFSOL) and a novel Subset Feature Selection (SuFS) algorithm are the feature dimension reduction techniques used to acquire better SER performance in this work. The proposed SER system is analyzed in both clean and noisy environments. The EMO-DB and IEMOCAP emotion databases are considered for evaluating the proposed SER performance. For noise analysis, the clean speech is corrupted with different noises of Aurora noise database and white Gaussian noise at different Signal to Noise Ratio (SNR) levels from -5dB to 20 dB. Support Vector Machine (SVM) classifier with linear and Radial Basis Function (RBF) kernels using 10-fold cross-validation and hold-out validation is used in this analysis with classification accuracy and computation time as the performance metrics. The results show that the proposed SER system outperforms the baseline SER system as well as many of the existing literature works both in clean and noisy conditions. For SNR levels >15 dB, the proposed SER system in presence of different noises performs same as the SER in clean environments. Whereas, for lower SNRs <15 dB the performance is likely to be reduced. Therefore, to overcome this drawback and improve the SER performance in noisy conditions, a dense Non-Negative Matrix Factorization (denseNMF) method is adopted for de-noising the noisy speech signal prior to SER achieving noise robustness. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:15
相关论文
共 53 条
[1]   Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection [J].
Ang, Jun Chin ;
Mirzal, Andri ;
Haron, Habibollah ;
Hamed, Haza Nuzly Abdull .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) :971-989
[2]  
[Anonymous], 2013, INT J ADV ROBOT SYST, DOI DOI 10.5772/55403
[3]  
[Anonymous], 2014, INT CONF ACOUST SPEE
[4]   Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction [J].
Arruti, Andoni ;
Cearreta, Idoia ;
Alvarez, Aitor ;
Lazkano, Elena ;
Sierra, Basilio .
PLOS ONE, 2014, 9 (10)
[5]  
Bashirpour M, 2016, IRAN J ELECT ELECT E
[6]  
Burkhardt F., 2005, 9 EUROPEAN C SPEECH
[7]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[8]  
Chen SH, 2017, 2016 AS PAC SIGN INF, DOI [10.1109/APSIPA.2016.7820708, DOI 10.1109/APSIPA.2016.7820708]
[9]  
Chenchah F, 2016, 2 INT C ADV TECHN SI, DOI [10.1109/ATSIP.2016.7523189, DOI 10.1109/ATSIP.2016.7523189]
[10]   Multiscale Amplitude Feature and Significance of Enhanced Vocal Tract Information for Emotion Classification [J].
Deb, Suman ;
Dandapat, Samarendra .
IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (03) :802-815