An Enhanced Emotion Recognition Algorithm Using Pitch Correlogram, Deep Sparse Matrix Representation and Random Forest Classifier

被引:6
作者
Hamsa, Shibani [1 ]
Iraqi, Youssef [1 ]
Shahin, Ismail [2 ]
Werghi, Naoufel [1 ]
机构
[1] Khalifa Univ Sci Technol & Res, Ctr Cyber Phys Syst C2PS, Dept Elect & Comp Engn ECE, Abu Dhabi, U Arab Emirates
[2] Univ Sharjah, Dept Elect Engn, Sharjah, U Arab Emirates
关键词
Emotion recognition; Feature extraction; Mel frequency cepstral coefficient; Noise reduction; Speech recognition; Hidden Markov models; Computational modeling; feature extraction; noise reduction; random forest classifier; SPEECH; SPEAKER; IDENTIFICATION; FEATURES;
D O I
10.1109/ACCESS.2021.3086062
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work presents an approach for text-independent and speaker-independent emotion recognition from speech in real application situations such as noisy and stressful talking conditions. We have incorporated a new way for feature extraction, representation, and noise reduction, replacing the frequently used cepstral features in the literature. The proposed algorithm is modeled as the combination of pitch-correlogram-based noise reduction pre-processing module, sparse-dense decomposition-based feature representation, and random forest classifier. The work is assessed in terms of efficiency and computational complexity using English and Arabic datasets corresponding to noisy and stressful talking conditions. Our system yields significant improvement in results in comparison with other techniques based on the same classifier model. The proposed network architecture achieves significant rise in performance correspond to the recent literature on benchmark datasets.
引用
收藏
页码:87995 / 88010
页数:16
相关论文
共 47 条
[1]   K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].
Aharon, Michal ;
Elad, Michael ;
Bruckstein, Alfred .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322
[2]   New approach in quantification of emotional intensity from the speech signal: emotional temperature [J].
Alonso, Jesus B. ;
Cabrera, Josue ;
Medina, Manuel ;
Travieso, Carlos M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (24) :9554-9564
[3]  
[Anonymous], 1997, P 5 EUROPEAN C SPEEC, DOI DOI 10.21437/EUROSPEECH.1997-494
[4]   Audiovisual emotion recognition in wild [J].
Avots, Egils ;
Sapinski, Tomasz ;
Bachmann, Maie ;
Kaminska, Dorota .
MACHINE VISION AND APPLICATIONS, 2019, 30 (05) :975-985
[5]  
Baby Deepak, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P2883, DOI 10.1109/ICASSP.2014.6854127
[6]   Support vector machines for speaker and language recognition [J].
Campbell, WM ;
Campbell, JP ;
Reynolds, DA ;
Singer, E ;
Torres-Carrasquillo, PA .
COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) :210-229
[7]  
Engan K, 1999, INT CONF ACOUST SPEE, P2443, DOI 10.1109/ICASSP.1999.760624
[8]   Acoustical properties of speech as indicators of depression and suicidal risk [J].
France, DJ ;
Shiavi, RG ;
Silverman, S ;
Silverman, M ;
Wilkes, DM .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2000, 47 (07) :829-837
[9]   Speech Emotion Recognition Using Local and Global Features [J].
Gao, Yuanbo ;
Li, Baobin ;
Wang, Ning ;
Zhu, Tingshao .
BRAIN INFORMATICS, BI 2017, 2017, 10654 :3-13
[10]   Using Student Test Scores to Measure Principal Performance [J].
Grissom, Jason A. ;
Kalogrides, Demetra ;
Loeb, Susanna .
EDUCATIONAL EVALUATION AND POLICY ANALYSIS, 2015, 37 (01) :3-28