Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features

被引:44
作者
Abdel-Hamid, Lamiaa [1 ]
机构
[1] Misr Int Univ, Fac Engn, Dept Elect & Commun, Cairo, Egypt
关键词
Speech emotion recognition; Arabic speech emotion database; Prosodic features; Mel-frequency cepstral coefficients (MFCC); Long-term average spectrum (LTAS); Wavelet transform; DATABASE; MFCC;
D O I
10.1016/j.specom.2020.04.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech emotion recognition (SER) has recently been receiving increased interest due to the rapid advancements in affective computing and human computer interaction. English, German, Mandarin and Indian are among the most commonly considered languages for SER along with other European and Asian languages. However, few researches have implemented Arabic SER systems due to the scarcity of available Arabic speech emotion databases. Although Egyptian Arabic is considered one of the most widely spoken and understood Arabic dialects in the Middle East, no Egyptian Arabic speech emotion database has yet been devised. In this work, a semi-natural Egyptian Arabic speech emotion (EYASE) database is introduced that has been created from an award winning Egyptian TV series. The EYASE database includes utterances from 3 male and 3 female professional actors considering four emotions: angry, happy, neutral and sad. Prosodic, spectral and wavelet features are computed from the EYASE database for emotion recognition. In addition to the classical pitch, intensity, formants and Mel-frequency cepstral coefficients (MFCC) widely implemented for SER, long-term average spectrum (LTAS) and wavelet parameters are also considered in this work. Speaker independent and speaker dependent experiments were performed for three different cases: (1) emotion vs. neutral classifications, (2) arousal and valence classifications and (3) multi emotion classifications. Several analysis were made to explore different aspects related to Arabic SER including the effect of gender and culture on SER. Furthermore, feature ranking was performed to evaluate the relevance of the LTAS and wavelet features for SER, in comparison to the more widely used prosodic and spectral features. Moreover, anger detection performance is compared for different combinations of the implemented prosodic, spectral and wavelet features. Feature ranking and anger detection performance analysis showed that both LTAS and wavelet features were relevant for Arabic SER and that they significantly improved emotion recognition rates.
引用
收藏
页码:19 / 30
页数:12
相关论文
共 66 条
  • [1] AKCAY MB, 2020, SPEECH COMMUN, P116
  • [2] The Impact of Language on Voice: An LTAS Study
    Bahmanbiglu, Samad Afshari
    Mojiri, Fariba
    Abnavi, Fateme
    [J]. JOURNAL OF VOICE, 2017, 31 (02) : 249.e9 - 249.e12
  • [3] Spectral characteristics of three styles of Croatian folk singing
    Boersma, P
    Kovacic, G
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 119 (03) : 1805 - 1816
  • [4] Boersma P., 1993, Proceedings of the Institute of Phonetic Sciences, P97, DOI DOI 10.1371/JOURNAL.PONE.0069107
  • [5] Boersma P., 2018, Praat: Doing phonetics by the computer
  • [6] Brody Leslie., 2009, GENDER EMOTION FAMIL
  • [7] Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech
    Cao, Houwei
    Verma, Ragini
    Nenkova, Ani
    [J]. COMPUTER SPEECH AND LANGUAGE, 2015, 29 (01) : 186 - 202
  • [8] Gender and Emotion Expression: A Developmental Contextual Perspective
    Chaplin, Tara M.
    [J]. EMOTION REVIEW, 2015, 7 (01) : 14 - 21
  • [9] Speech Emotion Recognition Using Cross-Correlation and Acoustic Features
    Chatterjee, Joyjit
    Mukesh, Vajja
    Hsu, Hui-Huang
    Vyas, Garima
    Liu, Zhen
    [J]. 2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 243 - 249
  • [10] ENTROPY-BASED ALGORITHMS FOR BEST BASIS SELECTION
    COIFMAN, RR
    WICKERHAUSER, MV
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1992, 38 (02) : 713 - 718