Speech emotion recognition approaches: A systematic review

被引:19
作者
Hashem, Ahlam [1 ]
Arif, Muhammad [1 ]
Alghamdi, Manal [1 ]
机构
[1] Umm Al Qura Univ, Dept Comp Sci, Al Abdiyah, Makkah, Saudi Arabia
关键词
Speech emotion recognition; Emotional speech database; Classification of emotion; Speech features; Systematic review; TIME-COURSE; NEURAL-NETWORK; FEATURES; SELECTION; DOMAIN; REPRESENTATIONS; CLASSIFICATION; CLASSIFIERS; INFORMATION; PERFORMANCE;
D O I
10.1016/j.specom.2023.102974
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The speech emotion recognition (SER) field has been active since it became a crucial feature in advanced Human-Computer Interaction (HCI), and wide real-life applications use it. In recent years, numerous SER systems have been covered by researchers, including the availability of appropriate emotional databases, selecting robustness features, and applying suitable classifiers using Machine Learning (ML) and Deep Learning (DL). Deep models proved to perform more accurately for SER than conventional ML techniques. Nevertheless, SER is yet challenging for classification where to separate similar emotional patterns; it needs a highly discriminative feature representation. For this purpose, this survey aims to critically analyze what is being done in this field of research in light of previous studies that aim to recognize emotions using speech audio in different aspects and review the current state of SER using DL. Through a systematic literature review whereby searching selected keywords from 2012-2022, 96 papers were extracted and covered the most current findings and directions. Specifically, we covered the database (acted, evoked, and natural) and features (prosodic, spectral, voice quality, and teager energy operator), the necessary preprocessing steps. Furthermore, different DL models and their performance are examined in depth. Based on our review, we also suggested SER aspects that could be considered in the future.
引用
收藏
页数:29
相关论文
共 258 条
[1]   Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models [J].
Abbaschian, Babak Joze ;
Sierra-Sosa, Daniel ;
Elmaghraby, Adel .
SENSORS, 2021, 21 (04) :1-27
[2]  
Abdelhamid A.A., 2023, Fusion: Pract. Appl., V10
[3]   Domain Adversarial for Acoustic Emotion Recognition [J].
Abdelwahab, Mohammed ;
Busso, Carlos .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) :2423-2435
[4]   LIGHT-SERNET: A LIGHTWEIGHT FULLY CONVOLUTIONAL NEURAL NETWORK FOR SPEECH EMOTION RECOGNITION [J].
Aftab, Arya ;
Morsali, Alireza ;
Ghaemmaghami, Shahrokh ;
Champagne, Benoit .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :6912-6916
[5]   Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].
Akcay, Mehmet Berkehan ;
Oguz, Kaya .
SPEECH COMMUNICATION, 2020, 116 (116) :56-76
[6]  
Alam MJ, 2013, INTERSPEECH, P2419
[7]   Privacy Enhanced Speech Emotion Communication using Deep Learning Aided Edge Computing [J].
Ali, Hafiz Shehbaz ;
ul Hassan, Fakhar ;
Latif, Siddique ;
Manzoor, Habib Ullah ;
Qadir, Junaid .
2021 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2021,
[8]   Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 [J].
Anagnostopoulos, Christos-Nikolaos ;
Iliou, Theodoros ;
Giannoukos, Ioannis .
ARTIFICIAL INTELLIGENCE REVIEW, 2015, 43 (02) :155-177
[9]   Improved speech emotion recognition with Mel frequency magnitude coefficient [J].
Ancilin, J. ;
Milton, A. .
APPLIED ACOUSTICS, 2021, 179
[10]  
[Anonymous], 2006, 22 INT C DAT ENG WOR, DOI [DOI 10.1109/ICDEW.2006.145, 10.1109/ICDEW.2006.145]