Enhanced speech emotion detection using deep neural networks

被引:0
作者
S. Lalitha
Shikha Tripathi
Deepa Gupta
机构
[1] Amrita School of Engineering,Department of Electronics & Communication Engineering
[2] Bengaluru,Department of Computer Science & Engineering
[3] Amrita Vishwa Vidyapeetham,Amrita School of Engineering, Bengaluru
[4] Amrita School of Engineering,undefined
[5] Bengaluru,undefined
[6] Amrita Vishwa Vidyapeetham,undefined
[7] Amrita Vishwa Vidyapeetham,undefined
[8] PES University (Formerly with Amrita Vishwa Vidyapeetham),undefined
来源
International Journal of Speech Technology | 2019年 / 22卷
关键词
Arousal; BFCC; Cepstrum; DNN; Emotion detection; Perceptual features; Recognition accuracy; Valence;
D O I
暂无
中图分类号
学科分类号
摘要
This paper focusses on investigation of the effective performance of perceptual based speech features on emotion detection. Mel frequency cepstral coefficients (MFCC’s), perceptual linear predictive cepstrum (PLPC), Mel frequency perceptual linear prediction cepstrum (MFPLPC), bark frequency cepstral coefficients (BFCC), revised perceptual linear prediction coefficient’s (RPLP) and inverted Mel frequency cepstral coefficients (IMFCC) are the perception features considered. The algorithm using these auditory cues is evaluated with deep neural networks (DNN). The novelty of the work involves analysis of the perceptual features to identify predominant features that contain significant emotional information about the speaker. The validity of the algorithm is analysed on publicly available Berlin database with seven emotions in 1-dimensional space termed categorical and 2-dimensional continuous space consisting of emotions in valence and arousal dimensions. Comparative analysis reveals that considerable improvement in the performance of emotion recognition is obtained using DNN with the identified combination of perceptual features.
引用
收藏
页码:497 / 510
页数:13
相关论文
共 35 条
[1]  
Anagnostopoulos CN(2015)Features and classifiers for emotion recognition from speech: A survey from 2010 to 2011 Artificial Intelligence Review 43 155-177
[2]  
Iliou T(2010)Class-level spectral features for emotion recognition Speech Communication 52 613-625
[3]  
Giannoukos I(2017)Emotion classification using segmentation of vowel-like and non-vowel-like regions IEEE Transactions on Affective Computing 99 1-1
[4]  
Bitouk D(2017)Efficiency of chosen speech descriptors in relation to emotion recognition Eurasip Journal of Speech, Audio and Music Processing 6 169-200
[5]  
Verma R(1992)Argument for basic emotions Cognition and Emotion. 54 903-916
[6]  
Nenkova A(2012)Classification of emotional speech using 3dec hierarchical classifier Speech Communication 11 587-595
[7]  
Deb S(2017)Speech emotion classification using combined neurogram and INTERSPEECH 2010 paralinguistic challenge features IET Signal Processing 15 131-150
[8]  
Dandapat S(2012)Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema International Journal of Speech Technology 61 81-88
[9]  
Dorota Kaminska T(2016)A novel adaptive fractional deep belief networks for speaker emotion recognition Alexandria Engineering Journal 6 69-75
[10]  
Sapinski G(1954)Three dimensions of emotions Psychological Review 8 3-14