Speech emotion recognition using semi-NMF feature optimization

被引:7
作者
Bandela, Surekha Reddy [1 ]
Kumar, T. Kishore [1 ]
机构
[1] NIT Warangal, Dept Elect & Commun Engn, Hanamkonda, Telangana, India
关键词
Speech emotion recognition; spectral; Teager energy operator; feature fusion; semi-nonnegative matrix factorization; k-nearest neighborhood; support vector machine; FEATURE-SELECTION; CLASSIFICATION; FREQUENCY;
D O I
10.3906/elk-1903-121
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent times, much research is progressing forward in the field of speech emotion recognition (SER). Many SER systems have been developed by combining different speech features to improve their performances. As a result, the complexity of the classifier increases to train this huge feature set. Additionally, some of the features could be irrelevant in emotion detection and this leads to a decrease in the emotion recognition accuracy. To overcome this drawback, feature optimization can be performed on the feature sets to obtain the most desirable emotional feature set before classifying the features. In this paper, semi-nonnegative matrix factorization (semi-NMF) with singular value decomposition (SVD) initialization is used to optimize the speech features. The speech features considered in this work are mel-frequency cepstral coefficients, linear prediction cepstral coefficients, and Teager energy operator-autocorrelation (TEO-AutoCorr). This work uses k-nearest neighborhood and support vector machine (SVM) for the classification of emotions with a 5-fold cross-validation scheme. The datasets considered for the performance analysis are EMO-DB and IEMOCAP. The performance of the proposed SER system using semi-NMF is validated in terms of classification accuracy. The results emphasize that the accuracy of the proposed SER system is improved remarkably upon using the semi-NMF algorithm for optimizing the feature sets compared to the baseline SER system without optimization.
引用
收藏
页码:3741 / 3757
页数:17
相关论文
共 54 条
  • [1] [Anonymous], 2015, Language Identification Using Spectral and Prosodic Features
  • [2] [Anonymous], ACOUST SPEECH SIG PR
  • [3] [Anonymous], 1997, Machine learning. mcgraw-hill science/engineering/math
  • [4] Attabi Y, 2013, INT CONF ACOUST SPEE, P7527, DOI 10.1109/ICASSP.2013.6639126
  • [5] Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data
    Bartenhagen, Christoph
    Klein, Hans-Ulrich
    Ruckert, Christian
    Jiang, Xiaoyi
    Dugas, Martin
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [6] A comparative study of traditional and newly proposed features for recognition of speech under stress
    Bou-Ghazale, SE
    Hansen, JHL
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 429 - 442
  • [7] BURKHARDT W, 2005, INTERSPEECH 2005, P1
  • [8] Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection
    Busso, Carlos
    Lee, Sungbok
    Narayanan, Shrikanth
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04): : 582 - 596
  • [9] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [10] NONLINEAR-ANALYSIS AND CLASSIFICATION OF SPEECH UNDER STRESSED CONDITIONS
    CAIRNS, DA
    HANSEN, JHL
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 96 (06) : 3392 - 3400