Survey on speech emotion recognition: Features, classification schemes, and databases

被引:1273
作者
El Ayadi, Moataz [1 ]
Kamel, Mohamed S. [2 ]
Karray, Fakhri [2 ]
机构
[1] Cairo Univ, Giza 12613, Egypt
[2] Univ Waterloo, Waterloo, ON N2L 1V9, Canada
关键词
Archetypal emotions; Speech emotion recognition; Statistical classifiers; Dimensionality reduction techniques; Emotional speech databases; SPEAKER IDENTIFICATION; LINEAR PREDICTION; MODEL; EXPRESSION; PROSODY; SYSTEM;
D O I
10.1016/j.patcog.2010.09.020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, increasing attention has been directed to the study of the emotional content of speech signals, and hence, many systems have been proposed to identify the emotional content of a spoken utterance. This paper is a survey of speech emotion classification addressing three important aspects of the design of a speech emotion recognition system. The first one is the choice of suitable features for speech representation. The second issue is the design of an appropriate classification scheme and the third issue is the proper preparation of an emotional speech database for evaluating system performance. Conclusions about the performance and limitations of current speech emotion recognition systems are discussed in the last section of this survey. This section also suggests possible ways of improving speech emotion recognition systems. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:572 / 587
页数:16
相关论文
共 143 条
[1]   On the determination of optimal model order for GMM-based text-independent speaker identification [J].
Abu El-Yazeed, MF ;
El Gamal, MA ;
El Ayadi, MMH .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (08) :1078-1087
[2]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[3]  
AMIR N, 2000, SPEECHEMOTION 2000, P29
[4]  
[Anonymous], 2007, Prosody and Speaker State: Paralinguistics, Pragmatics, and Proficiency
[5]  
[Anonymous], 1993, Discrete-Time Processing of Speech Signals
[6]  
[Anonymous], 2004, COMBINING PATTERN CL, DOI DOI 10.1002/0471660264
[7]  
[Anonymous], 1990, Journal of the American Voice I/O Society
[8]  
[Anonymous], P ISCA WORKSH SPEECH
[9]  
[Anonymous], 1997, P 5 EUROPEAN C SPEEC, DOI DOI 10.21437/EUROSPEECH.1997-494
[10]  
[Anonymous], P 3 INT C LANG LREC